linux/net
Jean-Philippe Brucker 1effe8ca4e skbuff: fix coalescing for page_pool fragment recycling
Fix a use-after-free when using page_pool with page fragments. We
encountered this problem during normal RX in the hns3 driver:

(1) Initially we have three descriptors in the RX queue. The first one
    allocates PAGE1 through page_pool, and the other two allocate one
    half of PAGE2 each. Page references look like this:

                RX_BD1 _______ PAGE1
                RX_BD2 _______ PAGE2
                RX_BD3 _________/

(2) Handle RX on the first descriptor. Allocate SKB1, eventually added
    to the receive queue by tcp_queue_rcv().

(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
    netif_receive_skb():

    netif_receive_skb(SKB2)
      ip_rcv(SKB2)
        SKB3 = skb_clone(SKB2)

    SKB2 and SKB3 share a reference to PAGE2 through
    skb_shinfo()->dataref. The other ref to PAGE2 is still held by
    RX_BD3:

                      SKB2 ---+- PAGE2
                      SKB3 __/   /
                RX_BD3 _________/

 (3b) Now while handling TCP, coalesce SKB3 with SKB1:

      tcp_v4_rcv(SKB3)
        tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
        kfree_skb_partial(SKB3)
          skb_release_data(SKB3)                // drops one dataref

                      SKB1 _____ PAGE1
                           \____
                      SKB2 _____ PAGE2
                                 /
                RX_BD3 _________/

    In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
    PAGE2, where it should instead have increased the page_pool frag
    reference, pp_frag_count. Without coalescing, when releasing both
    SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
    when releasing SKB1 and SKB2, two references to PAGE2 will be
    dropped, resulting in underflow.

 (3c) Drop SKB2:

      af_packet_rcv(SKB2)
        consume_skb(SKB2)
          skb_release_data(SKB2)                // drops second dataref
            page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count

                      SKB1 _____ PAGE1
                           \____
                                 PAGE2
                                 /
                RX_BD3 _________/

(4) Userspace calls recvmsg()
    Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
    release the SKB3 page as well:

    tcp_eat_recv_skb(SKB1)
      skb_release_data(SKB1)
        page_pool_return_skb_page(PAGE1)
        page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count

(5) PAGE2 is freed, but the third RX descriptor was still using it!
    In our case this causes IOMMU faults, but it would silently corrupt
    memory if the IOMMU was disabled.

Change the logic that checks whether pp_recycle SKBs can be coalesced.
We still reject differing pp_recycle between 'from' and 'to' SKBs, but
in order to avoid the situation described above, we also reject
coalescing when both 'from' and 'to' are pp_recycled and 'from' is
cloned.

The new logic allows coalescing a cloned pp_recycle SKB into a page
refcounted one, because in this case the release (4) will drop the right
reference, the one taken by skb_try_coalesce().

Fixes: 53e0961da1 ("page_pool: add frag page recycling support in page pool")
Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01 11:57:58 +01:00
..
6lowpan net: don't include ndisc.h from ipv6.h 2022-02-04 14:15:11 -08:00
9p xen/grant-table: remove readonly parameter from functions 2022-03-15 20:34:40 -05:00
802 net: 802: Use memset_startat() to clear struct fields 2021-11-19 11:23:23 +00:00
8021q vlan: use correct format characters 2022-03-17 16:34:49 -07:00
appletalk
atm proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
ax25 ax25: Fix UAF bugs in ax25 timers 2022-03-29 10:24:34 +02:00
batman-adv batman-adv: Use netif_rx(). 2022-03-07 11:40:41 +00:00
bluetooth Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed 2022-03-18 17:12:08 +01:00
bpf bpf, test_run: Fix packet size check for live packet mode 2022-03-11 22:01:26 +01:00
bpfilter uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
bridge net: bridge: mst: Restrict info size queries to bridge ports 2022-03-23 10:32:44 -07:00
caif net: caif: Use netif_rx(). 2022-03-04 12:02:19 +00:00
can can: isotp: restore accidentally removed MSG_PEEK feature 2022-03-31 09:46:58 +02:00
ceph libceph: drop else branches in prepare_read_data{,_cont} 2022-03-01 18:26:36 +01:00
core skbuff: fix coalescing for page_pool fragment recycling 2022-04-01 11:57:58 +01:00
dcb net: dcb: disable softirqs in dcbnl_flush_dev() 2022-03-03 08:01:55 -08:00
dccp dccp: remove max48() 2022-01-27 13:53:27 +00:00
decnet net: decnet: use time_is_before_jiffies() instead of open coding it 2022-02-28 13:21:32 +00:00
dns_resolver
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
ethernet gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
ethtool ethtool: add support to set/get completion queue event size 2022-02-23 20:33:05 -08:00
hsr net: add per-cpu storage and net->core_stats 2022-03-11 23:17:24 -08:00
ieee802154 net: ipv6: Handle delivery_time in ipv6 defrag 2022-03-03 14:38:48 +00:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
ipv6 netfilter: nft_fib: add reduce support 2022-03-20 00:29:47 +01:00
iucv s390/iucv: sort out physical vs virtual pointers usage 2022-02-22 16:09:13 -08:00
kcm net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
key af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register 2022-03-10 07:39:47 +01:00
l2tp l2tp: add netns refcount tracker to l2tp_dfs_seq_data 2021-12-10 06:38:27 -08:00
l3mdev net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
lapb
llc llc: only change llc->dev when bind() succeeds 2022-03-25 16:55:41 -07:00
mac80211 mac80211: update bssid_indicator in ieee80211_assign_beacon 2022-03-15 11:50:33 +01:00
mac802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
mctp mctp: Avoid warning if unregister notifies twice 2022-02-25 22:23:23 -08:00
mpls net: mpls: Fix GCC 12 warning 2022-02-10 15:29:39 +00:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
ncsi all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate 2022-01-15 08:47:31 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf 2022-03-28 08:57:10 -07:00
netlabel netlabel: fix out-of-bounds memory accesses 2022-03-21 10:59:11 +00:00
netlink af_netlink: Fix shift out of bounds in group mask calculation 2022-03-18 21:47:14 -07:00
netrom netrom: fix api breakage in nr_setsockopt() 2022-01-07 14:11:05 +00:00
nfc nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent" 2022-03-03 10:43:37 +00:00
nsh
openvswitch openvswitch: Add recirc_id to recirc warning 2022-03-31 08:52:48 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-17 13:56:58 -07:00
phonet phonet: Use netif_rx(). 2022-03-07 11:40:41 +00:00
psample
qrtr bus: mhi: core: Add an API for auto queueing buffers for DL channel 2021-12-17 17:17:14 +01:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-16 16:13:19 -08:00
rfkill rfkill: make new event layout opt-in 2022-03-18 13:09:17 +02:00
rose net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
rxrpc rxrpc: fix some null-ptr-deref bugs in server_key.c 2022-03-31 15:21:31 +02:00
sched net/sched: act_ct: fix ref leak when switching zones 2022-03-26 17:00:51 -07:00
sctp selinux/stable-5.18 PR 20220321 2022-03-21 20:47:54 -07:00
smc net/smc: Send out the remaining data in sndbuf before close 2022-03-28 16:06:27 -07:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-09 01:05:28 +01:00
sunrpc NFS client updates for Linux 5.18 2022-03-29 18:55:37 -07:00
switchdev net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device 2022-02-24 21:31:43 -08:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
tls net/tls: fix slab-out-of-bounds bug in decrypt_internal 2022-04-01 11:53:35 +01:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
vmw_vsock vsock/virtio: enable VQs early on probe 2022-03-24 18:36:36 -07:00
wireless brcmfmac 2022-03-11 13:00:17 -08:00
x25 net/x25: Fix null-ptr-deref caused by x25_disconnect 2022-03-26 11:48:16 -07:00
xdp xsk: Do not write NULL in SW ring at allocation failure 2022-03-28 19:56:28 -07:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ 2022-03-19 14:49:08 +00:00
compat.c
devres.c
Kconfig page_pool: Add allocation stats 2022-03-03 09:55:28 +00:00
Kconfig.debug net: add networking namespace refcount tracker 2021-12-10 06:38:26 -08:00
Makefile
socket.c fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
sysctl_net.c sections: move and rename core_kernel_data() to is_kernel_core_data() 2021-11-09 10:02:50 -08:00