linux/net/core
Talal Ahmad 9b65b17db7 net: avoid double accounting for pure zerocopy skbs
Track skbs containing only zerocopy data and avoid charging them to
kernel memory to correctly account the memory utilization for
msg_zerocopy. All of the data in such skbs is held in user pages which
are already accounted to user. Before this change, they are charged
again in kernel in __zerocopy_sg_from_iter. The charging in kernel is
excessive because data is not being copied into skb frags. This
excessive charging can lead to kernel going into memory pressure
state which impacts all sockets in the system adversely. Mark pure
zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
charge/uncharge for data in such skbs.

Initially, an skb is marked pure zerocopy when it is empty and in
zerocopy path. skb can then change from a pure zerocopy skb to mixed
data skb (zerocopy and copy data) if it is at tail of write queue and
there is room available in it and non-zerocopy data is being sent in
the next sendmsg call. At this time sk_mem_charge is done for the pure
zerocopied data and the pure zerocopy flag is unmarked. We found that
this happens very rarely on workloads that pass MSG_ZEROCOPY.

A pure zerocopy skb can later be coalesced into normal skb if they are
next to each other in queue but this patch prevents coalescing from
happening. This avoids complexity of charging when skb downgrades from
pure zerocopy to mixed. This is also rare.

In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in
tcp_skb_entail for an skb without data.

Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
with zerocopy showed that before this patch the 'sock' variable in
memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
change it is 0. This is due to no charge to sk_forward_alloc for
zerocopy data and shows memory utilization for kernel is lowered.

With this commit we don't see the warning we saw in previous commit
which resulted in commit 84882cf72c.

Signed-off-by: Talal Ahmad <talalahmad@google.com>
Acked-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 11:19:49 +00:00
..
bpf_sk_storage.c net: in_irq() cleanup 2021-08-13 14:09:19 -07:00
datagram.c net: avoid double accounting for pure zerocopy skbs 2021-11-03 11:19:49 +00:00
datagram.h
dev_addr_lists.c net: dev_addr_list: handle first address in __hw_addr_add_ex 2021-09-30 13:29:09 +01:00
dev_ioctl.c ethtool: push the rtnl_lock into dev_ethtool() 2021-11-01 13:26:07 +00:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
devlink.c ethtool: don't drop the rtnl_lock half way thru the ioctl 2021-11-01 13:26:07 +00:00
drop_monitor.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dst_cache.c
dst.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
failover.c
fib_notifier.c
fib_rules.c memcg: enable accounting for IP address and routing-related objects 2021-07-20 06:00:38 -07:00
filter.c bpf: Add bpf_skc_to_unix_sock() helper 2021-10-21 15:11:06 -07:00
flow_dissector.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-11-01 20:05:14 -07:00
flow_offload.c net: Fix offloading indirect devices dependency on qdisc order creation 2021-08-19 13:19:30 +01:00
gen_estimator.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
gen_stats.c net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding. 2021-10-21 12:47:56 +01:00
gro_cells.c
hwbm.c
link_watch.c net: linkwatch: fix failure to restore device state across suspend/resume 2021-08-11 14:43:16 -07:00
lwt_bpf.c lwt_bpf: Replace preempt_disable() with migrate_disable() 2020-12-07 11:53:40 -08:00
lwtunnel.c netfilter: add netfilter hooks to SRv6 data plane 2021-08-30 01:51:36 +02:00
Makefile of: net: move of_net under net/ 2021-10-07 13:39:51 +01:00
neighbour.c net, neigh: Reject creating NUD_PERMANENT with NTF_MANAGED entries 2021-10-14 19:16:21 -07:00
net_namespace.c net: net_namespace: Fix undefined member in key_remove_domain() 2021-09-19 12:43:04 +01:00
net-procfs.c Revert "net: procfs: add seq_puts() statement for dev_mcast" 2021-10-13 17:24:38 -07:00
net-sysfs.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
net-sysfs.h
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c asm-generic/unaligned: Unify asm/unaligned.h around struct helper 2021-07-02 12:43:40 -07:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c of: net: add a helper for loading netdev->dev_addr 2021-10-07 13:39:51 +01:00
page_pool.c page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA 2021-10-15 10:54:20 +01:00
pktgen.c pktgen: remove unused variable 2021-09-03 11:48:28 +01:00
ptp_classifier.c bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
request_sock.c
rtnetlink.c net: rtnetlink: use __dev_addr_set() 2021-10-24 13:59:44 +01:00
scm.c memcg: enable accounting for scm_fp_list objects 2021-07-20 06:00:38 -07:00
secure_seq.c
selftests.c net: core: constify mac addrs in selftests 2021-10-24 13:59:44 +01:00
skbuff.c net: avoid double accounting for pure zerocopy skbs 2021-11-03 11:19:49 +00:00
skmsg.c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-11-01 19:59:46 -07:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c
sock_map.c af_unix: Add unix_stream_proto for sockmap 2021-08-16 18:43:39 -07:00
sock_reuseport.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
sock.c vsock: Enable y2038 safe timeval for timeout 2021-10-08 16:21:53 +01:00
stream.c net: stream: don't purge sk_error_queue in sk_stream_kill_queues() 2021-10-16 09:06:09 +01:00
sysctl_net_core.c bpf: Prevent increasing bpf_jit_limit above max 2021-10-22 17:23:53 -07:00
timestamping.c
tso.c
utils.c
xdp.c xdp: Remove redundant warning 2021-10-27 18:13:57 -07:00