linux/net
Eric Dumazet 669c0b5782 net: avoid 32 x truesize under-estimation for tiny skbs
[ Upstream commit 3226b158e6 ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:49:56 +01:00
..
6lowpan 6lowpan: Off by one handling ->nexthdr 2020-01-27 14:50:41 +01:00
9p net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid 2020-11-05 11:08:44 +01:00
802
8021q net: vlan: avoid leaks on register_vlan_dev() failures 2021-01-17 14:04:19 +01:00
appletalk appletalk: Set error code if register_snap_client failed 2019-12-13 08:52:59 +01:00
atm atm: fix a memory leak of vcc->user_back 2020-10-01 13:14:43 +02:00
ax25 AX.25: Prevent integer overflows in connect and sendmsg 2020-07-31 18:37:48 +02:00
batman-adv batman-adv: set .owner to THIS_MODULE 2020-12-02 08:48:10 +01:00
bluetooth Bluetooth: Fix null pointer dereference in hci_event_packet() 2020-12-30 11:25:52 +01:00
bpf
bpfilter signal/bpfilter: Fix bpfilter_kernl to use send_sig not force_sig 2020-01-27 14:50:51 +01:00
bridge net: bridge: vlan: fix error return code in __vlan_add() 2020-12-30 11:25:41 +01:00
caif net: use skb_queue_empty_lockless() in poll() handlers 2019-11-10 11:27:48 +01:00
can can: af_can: prevent potential access of uninitialized member in canfd_rcv() 2020-11-24 13:27:22 +01:00
ceph libceph: clear con->out_msg on Policy::stateful_server faults 2020-11-05 11:08:53 +01:00
core net: avoid 32 x truesize under-estimation for tiny skbs 2021-01-23 15:49:56 +01:00
dcb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands 2021-01-23 15:49:56 +01:00
dccp net: ipv6: add net argument to ip6_dst_lookup_flow 2020-04-29 16:31:16 +02:00
decnet net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2020-01-04 19:13:37 +01:00
dns_resolver KEYS: Don't write out to userspace while holding key semaphore 2020-04-23 10:30:24 +02:00
dsa net: dsa: read mac address from DT for slave device 2020-11-10 12:36:02 +01:00
ethernet net: add annotations on hh->hh_len lockless accesses 2020-01-09 10:19:09 +01:00
hsr hsr: check protocol version in hsr_newlink() 2020-04-21 09:03:03 +02:00
ieee802154 nl802154: add missing attribute validation for dev_type 2020-03-18 07:14:15 +01:00
ife
ipv4 esp: avoid unneeded kmap_atomic call 2021-01-23 15:49:55 +01:00
ipv6 net: sit: unregister_netdevice on newlink's error path 2021-01-23 15:49:56 +01:00
iucv net/af_iucv: set correct sk_protocol for child sockets 2020-12-08 10:18:52 +01:00
kcm kcm: switch order of device registration to fix a crash 2019-04-17 08:38:40 +02:00
key af_key: pfkey_dump needs parameter validation 2020-09-26 18:01:28 +02:00
l2tp l2tp: remove skb_dst_set() from l2tp_xmit_skb() 2020-07-22 09:31:59 +02:00
l3mdev
lapb lapb: fixed leak of control-blocks. 2019-06-22 08:15:13 +02:00
llc net: silence data-races on sk_backlog.tail 2020-10-01 13:14:26 +02:00
mac80211 mac80211: don't set set TDLS STA bandwidth wider than possible 2020-12-30 11:26:03 +01:00
mac802154 mac802154: tx: fix use-after-free 2020-10-01 13:14:51 +02:00
mpls net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2020-04-29 16:31:17 +02:00
ncsi net/ncsi: Use real net-device for response handler 2021-01-12 20:10:18 +01:00
netfilter netfilter: nf_nat: Fix memleak in nf_nat_init 2021-01-19 18:22:38 +01:00
netlabel netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist() 2020-11-24 13:27:17 +01:00
netlink genetlink: remove genl_bind 2020-07-22 09:31:58 +02:00
netrom net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node 2020-04-29 16:31:21 +02:00
nfc nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download() 2020-10-29 09:54:58 +01:00
nsh
openvswitch openvswitch: handle DNAT tuple collision 2020-10-14 10:31:24 +02:00
packet net/packet: fix overflow in tpacket_rcv 2020-10-07 08:00:08 +02:00
phonet net: use skb_queue_empty_lockless() in poll() handlers 2019-11-10 11:27:48 +01:00
psample net: psample: fix skb_over_panic 2019-12-05 09:21:30 +01:00
qrtr net: qrtr: check skb_put_padto() return value 2020-09-26 18:01:30 +02:00
rds rds: Prevent kernel-infoleak in rds_notify_queue_get() 2020-08-05 10:06:01 +02:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2020-01-12 12:17:17 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-12-08 10:18:52 +01:00
rxrpc rxrpc: Call state should be read with READ_ONCE() under some circumstances 2021-01-23 15:49:56 +01:00
sched net: sched: prevent invalid Scell_log shift count 2021-01-12 20:10:20 +01:00
sctp sctp: change to hold/put transport for proto_unreach_timer 2020-11-24 13:27:18 +01:00
smc net/smc: fix valid DMBE buffer sizes 2020-10-29 09:54:55 +01:00
strparser net: strparser: partially revert "strparser: Call skb_unclone conditionally" 2019-05-16 19:41:27 +02:00
sunrpc net: sunrpc: interpret the return value of kstrtou32 correctly 2021-01-19 18:22:38 +01:00
switchdev
tipc tipc: fix memory leak in tipc_topsrv_start() 2020-11-18 19:18:51 +01:00
tls net/tls: Protect from calling tls_dev_del for TLS RX twice 2020-12-08 10:18:52 +01:00
unix skbuff: fix a data race in skb_queue_len() 2020-10-01 13:14:32 +02:00
vmw_vsock vsock: use ns_capable_noaudit() on socket create 2020-11-10 12:35:59 +01:00
wimax
wireless cfg80211: initialize rekey_data 2020-12-30 11:26:06 +01:00
x25 net/x25: prevent a couple of overflows 2020-12-08 10:18:54 +01:00
xdp xsk: Fix xsk_poll()'s return type 2020-12-30 11:25:44 +01:00
xfrm net: xfrm: fix a race condition during allocing spi 2020-11-18 19:18:41 +01:00
compat.c net/compat: Add missing sock updates for SCM_RIGHTS 2020-08-21 11:05:32 +02:00
Kconfig
Makefile
socket.c net: Set fput_needed iff FDPUT_FPUT is set 2020-08-19 08:15:03 +02:00
sysctl_net.c