linux/net
Liping Zhang b72f1119c6 netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
commit 9c3f379492 upstream.

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.

But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.

For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.

For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.

I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)

Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
  # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
        Also note, in the userspace we only add a 20s' delay, then
        reinject the syn packet to the kernel:
  # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
  # iptables -F FORWARD
  # rmmod ipt_SYNPROXY
  # rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.

Now you will see the panic like this:
  kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
  Call Trace:
   ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
   init_conntrack+0x12b/0x600 [nf_conntrack]
   nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
   ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
   nf_reinject+0x104/0x270
   nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
   ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
   ? nla_parse+0xa0/0x100
   nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
   [...]

One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.

But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.

Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-24 17:02:34 -07:00
..
6lowpan 6lowpan: put mcast compression in an own function 2015-10-21 00:49:25 +02:00
9p p9_client_readdir() fix 2017-05-02 21:19:55 -07:00
802
8021q vlan: Propagate MAC address to VLANs 2017-08-06 19:19:43 -07:00
appletalk
atm atm: deal with setting entry before mkip was called 2015-09-17 22:13:32 -07:00
ax25 ax25: Fix segfault after sock connection timeout 2017-02-04 09:45:09 +01:00
batman-adv batman-adv: Check for alloc errors when preparing TT local data 2016-12-15 08:49:23 -08:00
bluetooth Bluetooth: use constant time memory comparison for secret values 2017-07-27 15:06:04 -07:00
bridge net: bridge: start hello timer only if device is up 2017-06-14 13:16:19 +02:00
caif net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx 2017-07-05 14:37:14 +02:00
can can: Fix kernel panic at security_sock_rcv_skb 2017-02-18 16:39:26 +01:00
ceph libceph: force GFP_NOIO for socket allocations 2017-04-08 09:53:30 +02:00
core net: avoid skb_warn_bad_offload false positives on UFO 2017-08-12 19:29:08 -07:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp dccp: fix a memleak for dccp_feat_init err process 2017-08-11 09:08:54 -07:00
decnet decnet: always not take dst->__refcnt when inserting dst into hash table 2017-07-05 14:37:14 +02:00
dns_resolver net: dns_resolver: convert time_t to time64_t 2015-11-18 16:27:46 -05:00
dsa net: dsa: Check return value of phy_connect_direct() 2017-07-05 14:37:19 +02:00
ethernet net: introduce device min_header_len 2017-02-18 16:39:27 +01:00
hsr net/hsr: fix a warning message 2015-11-23 14:56:15 -05:00
ieee802154 net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
ipv4 net: account for current skb length when deciding about UFO 2017-08-12 19:29:09 -07:00
ipv6 net: account for current skb length when deciding about UFO 2017-08-12 19:29:09 -07:00
ipx ipx: call ipxitf_put() in ioctl error path 2017-05-25 14:30:13 +02:00
irda irda: Fix lockdep annotations in hashbin_delete(). 2017-02-26 11:07:50 +01:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-03-03 15:07:03 -08:00
key af_key: Add lock to key dump 2017-08-06 19:19:38 -07:00
l2tp l2tp: fix PPP pseudo-wire auto-loading 2017-05-02 21:19:52 -07:00
l3mdev net: Add netif_is_l3_slave 2015-10-07 04:27:43 -07:00
lapb
llc net/llc: avoid BUG_ON() in skb_orphan() 2017-02-26 11:07:49 +01:00
mac80211 mac80211: initialize SMPS field in HT capabilities 2017-07-05 14:37:20 +02:00
mac802154 mac802154: llsec: use kzfree 2015-10-21 00:49:24 +02:00
mpls mpls: Send route delete notifications when router module is unloaded 2017-03-22 12:04:16 +01:00
netfilter netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister 2017-08-24 17:02:34 -07:00
netlabel netlabel: add address family checks to netlbl_{sock,req}_delattr() 2016-08-20 18:09:22 +02:00
netlink netlink: Allow direct reclaim for fallback allocation 2017-05-08 07:46:02 +02:00
netrom
nfc NFC: Add sockaddr length checks before accessing sa_family in bind handlers 2017-07-27 15:06:03 -07:00
openvswitch openvswitch: fix potential out of bound access in parse_ct 2017-08-11 09:08:53 -07:00
packet packet: fix tp_reserve race in packet_set_ring 2017-08-12 19:29:08 -07:00
phonet phonet: properly unshare skbs in phonet_rcv() 2016-01-31 11:29:00 -08:00
rds rds: tcp: use sock_create_lite() to create the accept socket 2017-07-21 07:44:55 +02:00
rfkill rfkill: fix rfkill_fop_read wait_event usage 2016-03-03 15:07:26 -08:00
rose
rxrpc rxrpc: Fix several cases where a padded len isn't checked in ticket decode 2017-06-29 12:48:52 +02:00
sched net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target 2017-08-12 19:29:08 -07:00
sctp sctp: check af before verify address in sctp_addr_id2transport 2017-07-05 14:37:21 +02:00
sunrpc SUNRPC: fix refcounting problems with auth_gss messages. 2017-04-21 09:30:08 +02:00
switchdev switchdev: pass pointer to fib_info instead of copy 2016-06-24 10:18:16 -07:00
tipc tipc: ignore requests when the connection state is not CONNECTED 2017-06-17 06:39:38 +02:00
unix af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers 2017-07-05 14:37:13 +02:00
vmw_vsock VSOCK: Detach QP check should filter out non matching QPs. 2017-04-27 09:09:32 +02:00
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless cfg80211: Check if PMKID attribute is of expected size 2017-07-21 07:44:56 +02:00
x25 net: fix a kernel infoleak in x25 module 2016-05-18 17:06:43 -07:00
xfrm xfrm: Don't use sk_family for socket policy lookups 2017-08-06 19:19:46 -07:00
compat.c
Kconfig net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
Makefile net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
socket.c net: socket: fix recvmmsg not returning error from sock_error 2017-02-26 11:07:50 +01:00
sysctl_net.c net: Use ns_capable_noaudit() when determining net sysctl permissions 2016-09-15 08:27:50 +02:00