linux/include/net
Eric Dumazet 93ab6cc691 tcp: implement mmap() for zero copy receive
Some networks can make sure TCP payload can exactly fit 4KB pages,
with well chosen MSS/MTU and architectures.

Implement mmap() system call so that applications can avoid
copying data without complex splice() games.

Note that a successful mmap( X bytes) on TCP socket is consuming
bytes, as if recvmsg() has been done. (tp->copied += X)

Only PROT_READ mappings are accepted, as skb page frags
are fundamentally shared and read only.

If tcp_mmap() finds data that is not a full page, or a patch of
urgent data, -EINVAL is returned, no bytes are consumed.

Application must fallback to recvmsg() to read the problematic sequence.

mmap() wont block,  regardless of socket being in blocking or
non-blocking mode. If not enough bytes are in receive queue,
mmap() would return -EAGAIN, or -EIO if socket is in a state
where no other bytes can be added into receive queue.

An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
to efficiently use mmap()

On the sender side, MSG_EOR might help to clearly separate unaligned
headers and 4K-aligned chunks if necessary.

Tested:

mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)

Without mmap() (tcp_mmap -s)

received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
  cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
  cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
  cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
  cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches

With mmap() on receiver (tcp_mmap -s -z)

received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
  cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
  cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
  cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
  cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-16 18:26:37 -04:00
..
9p
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2018-04-08 17:19:15 -04:00
caif caif: reduce stack size with KASAN 2018-01-19 14:02:12 -05:00
iucv
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
netns ip6mr: Support fib notifications 2018-03-26 13:14:43 -04:00
nfc
phonet
sctp selinux/stable-4.17 PR 20180403 2018-04-06 15:39:26 -07:00
tc_act net/sched: act_csum: don't use spinlock in the fast path 2018-01-23 19:51:46 -05:00
6lowpan.h
act_api.h net/sched: remove tcf_idr_cleanup() 2018-03-23 21:52:19 -04:00
addrconf.h bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
af_ieee802154.h
af_rxrpc.h rxrpc, afs: Use debug_ids rather than pointers in traces 2018-03-27 23:03:00 +01:00
af_unix.h
af_vsock.h
ah.h
arp.h ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY 2018-01-15 14:53:43 -05:00
atmclip.h
ax25.h net: Make ax25_ptr depend on CONFIG_AX25 2018-02-14 11:55:33 -05:00
ax88796.h
bond_3ad.h
bond_alb.h
bond_options.h
bonding.h
busy_poll.h
calipso.h
cfg80211-wext.h
cfg80211.h nl80211: Add control_port_over_nl80211 to mesh_setup 2018-03-29 14:01:27 +02:00
cfg802154.h
checksum.h
cipso_ipv4.h
cls_cgroup.h
codel_impl.h
codel_qdisc.h
codel.h
compat.h net: remove compat_sys_*() prototypes from net/compat.h 2018-04-02 20:16:17 +02:00
datalink.h
dcbevent.h
dcbnl.h
devlink.h devlink: convert occ_get op to separate registration 2018-04-08 12:45:57 -04:00
dn_dev.h
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h
dsa.h dsa: Pass the port to get_sset_count() 2018-03-04 13:34:18 -05:00
dsfield.h
dst_cache.h net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency 2018-03-05 12:52:45 -05:00
dst_metadata.h
dst_ops.h
dst.h net: core: dst: Add kernel-doc for 'net' parameter 2018-03-05 12:52:45 -05:00
erspan.h net: erspan: fix metadata extraction 2018-02-06 11:32:48 -05:00
esp.h
ethoc.h inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
fib_notifier.h
fib_rules.h net/ipv6: Pass skb to route lookup 2018-03-04 13:04:22 -05:00
firewire.h
flow_dissector.h
flow.h net: Remove unused get_hash_from_flow functions 2018-03-04 13:04:23 -05:00
fou.h
fq_impl.h
fq.h
garp.h
gen_stats.h
genetlink.h
geneve.h
gre.h net: GRE: Add is_gretap_dev, is_ip6gretap_dev 2018-02-27 14:46:26 -05:00
gro_cells.h
gtp.h
gue.h
hwbm.h
icmp.h
ieee80211_radiotap.h mac80211: support reporting A-MPDU EOF bit value/known 2018-02-22 21:13:02 +01:00
ieee802154_netdev.h
if_inet6.h
ife.h
ila.h
inet_common.h net: Introduce __inet_bind() and __inet6_bind 2018-03-31 02:15:43 +02:00
inet_connection_sock.h inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
inet_ecn.h
inet_frag.h inet: frags: reorganize struct netns_frags 2018-03-31 23:25:39 -04:00
inet_hashtables.h
inet_sock.h
inet_timewait_sock.h soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inet6_connection_sock.h
inet6_hashtables.h
inetpeer.h
ip_fib.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
ip_tunnels.h net: do not create fallback tunnels for non-default namespaces 2018-03-09 11:23:11 -05:00
ip_vs.h netfilter: ipvs: Remove useless ipvsh param of frag_safe_skb_hp 2018-01-08 18:01:02 +01:00
ip.h inet: frags: remove some helpers 2018-03-31 23:25:39 -04:00
ip6_checksum.h
ip6_fib.h net/ipv6: Pass skb to route lookup 2018-03-04 13:04:22 -05:00
ip6_route.h ipv6: add a wrapper for ip6_dst_store() with flowi6 checks 2018-04-04 11:31:57 -04:00
ip6_tunnel.h
ipcomp.h
ipconfig.h
ipv6.h ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow() 2018-04-04 11:31:57 -04:00
ipx.h
iw_handler.h net: Spelling s/stucture/structure/ 2018-03-27 09:51:23 +02:00
kcm.h
l3mdev.h
lapb.h
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: properly handle dev_queue_xmit() return value 2018-03-27 11:56:00 -04:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
lwtunnel.h net: Move ipv4 set_lwt_redirect helper to lwtunnel 2018-02-14 14:43:32 -05:00
mac80211.h We have a fair number of patches, but many of them are from the 2018-03-29 16:23:26 -04:00
mac802154.h
mip6.h
mld.h
mpls_iptunnel.h
mpls.h
mrp.h
ncsi.h
ndisc.h
neighbour.h
net_namespace.h net: Introduce net_rwsem to protect net_namespace_list 2018-03-29 13:47:53 -04:00
net_ratelimit.h
netevent.h net/ipv6: Add support for path selection using hash of 5-tuple 2018-03-04 13:04:23 -05:00
netlabel.h
netlink.h
netprio_cgroup.h
netrom.h
nexthop.h net: fix rtnh_ok() 2018-04-07 22:32:31 -04:00
nl802154.h
nsh.h
p8022.h
ping.h
pkt_cls.h net: sch: prio: Add offload ability for grafting a child 2018-02-28 12:06:01 -05:00
pkt_sched.h net: remove prototype of qdisc_lookup_class() 2018-01-16 14:56:54 -05:00
pptp.h
protocol.h
psample.h
psnap.h
raw.h
rawv6.h
red.h
regulatory.h cfg80211: read wmm rules from regulatory database 2018-03-29 11:11:40 +02:00
request_sock.h
rose.h
route.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
rsi_91x.h Bluetooth: btrsi: add new rsi bluetooth driver 2018-03-13 18:37:02 +02:00
rtnetlink.h
sch_generic.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
scm.h
secure_seq.h
seg6_hmac.h
seg6.h
slhc_vj.h slip: Check if rstate is initialized before uncompressing 2018-04-11 10:33:46 -04:00
smc.h
snmp.h
sock_reuseport.h
sock.h slab: make usercopy region 32-bit 2018-04-05 21:36:24 -07:00
Space.h net/mac89x0: Convert to platform_driver 2018-03-01 21:21:36 -05:00
stp.h
strparser.h
switchdev.h
tcp_states.h tcp: remove the hardcode in the definition of TCPF Macro 2018-02-21 15:06:05 -05:00
tcp.h tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
timewait_sock.h
tipc.h
tls.h tls: support for Inline tls record 2018-03-31 23:37:32 -04:00
transp_v6.h
tso.h
tun_proto.h
udp_tunnel.h
udp.h bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
udplite.h udplite: fix partial checksum initialization 2018-02-16 15:57:42 -05:00
vsock_addr.h
vxlan.h vxlan: Fix trailing semicolon 2018-01-17 16:07:24 -05:00
wext.h lift handling of SIOCIW... out of dev_ioctl() 2018-01-24 19:13:45 -05:00
wimax.h
x25.h
x25device.h
xdp.h
xfrm.h xfrm: Register xfrm_dev_notifier in appropriate place 2018-03-30 10:59:23 -04:00