linux/net
Sven Wegener fb97898fa8 ipvs: Add missing locking during connection table hashing and unhashing
commit aea9d711f3 upstream.

The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.

Here is what happens in pretty verbose form:

CPU 0					CPU 1
------------				------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.

					IRQ assignment is changed to this CPU,
					but the expire timer stays scheduled on
					the other CPU.

					New connection from same ip:port comes
					in right before the timer expires, we
					find the inactive connection in our
					connection table and get a reference to
					it. We proper lock the connection in
					tcp_state_transition() and read the
					connection flags in set_tcp_state().

ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!

					While still holding proper locks we
					write the connection flags in
					set_tcp_state() and this sets the hashed
					flag again.

ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.

					We drop the reference we held on the
					connection and schedule the expire timer
					for timeouting the connection on this
					CPU. Further packets won't be able to
					find this connection in our connection
					table.

					ip_vs_conn_expire() gets called again,
					we think it's already hashed, but the
					list_head is dangling and while removing
					the connection from our connection table
					we write to the memory location where
					this list_head points to.

The result will probably be a kernel oops at some other point in time.

This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02 10:20:50 -07:00
..
9p 9p: fix readdir corner cases 2009-11-02 08:43:45 -06:00
802
8021q vlan: Fix register_vlan_dev() error path 2009-11-17 06:45:04 -08:00
appletalk Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESS 2009-09-14 17:02:47 -07:00
atm net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
ax25 ax25: netrom: rose: Fix timer oopses 2010-02-09 04:50:56 -08:00
bluetooth Bluetooth: Fix kernel crash on L2CAP stress tests 2010-04-01 15:58:55 -07:00
bridge netfilter: ebtables: enforce CAP_NET_ADMIN 2010-01-18 10:19:40 -08:00
can can: should not use __dev_get_by_index() without locks 2009-11-08 00:33:43 -08:00
core scm: Only support SCM_RIGHTS on unix domain sockets. 2010-03-15 08:49:58 -07:00
dcb net: fix double skb free in dcbnl 2009-09-26 20:16:15 -07:00
dccp dccp_probe: Fix module load dependencies between dccp and dccp_probe 2010-05-12 14:57:11 -07:00
decnet decnet: netdevice refcount leak 2009-11-06 00:50:39 -08:00
dsa netdev: convert pseudo-devices to netdev_tx_t 2009-09-01 01:13:07 -07:00
econet Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-08-12 17:44:53 -07:00
ethernet
ieee802154 net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
ipv4 ipv4: udp: fix short packet and bad checksum logging 2010-05-26 14:29:13 -07:00
ipv6 ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure 2010-03-15 08:49:39 -07:00
ipx net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
irda headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
iucv net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
key net: file_operations should be const 2009-09-02 01:03:53 -07:00
lapb net: remove NET_RX_BAD and NET_RX_CN* defines 2009-07-05 19:15:35 -07:00
llc net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
mac80211 mac80211: Handle mesh action frames in ieee80211_rx_h_action 2010-08-02 10:20:47 -07:00
netfilter ipvs: Add missing locking during connection table hashing and unhashing 2010-08-02 10:20:50 -07:00
netlabel Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-07-30 19:22:43 -07:00
netlink net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
netrom ax25: netrom: rose: Fix timer oopses 2010-02-09 04:50:56 -08:00
packet af_packet: Don't use skb after dev_queue_xmit() 2010-02-09 04:50:56 -08:00
phonet Phonet: fix mutex imbalance 2009-09-30 16:41:34 -07:00
rds net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
rfkill Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-11-23 14:01:47 -08:00
rose ax25: netrom: rose: Fix timer oopses 2010-02-09 04:50:56 -08:00
rxrpc net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
sched pkt_sched: pedit use proper struct 2009-10-11 23:03:47 -07:00
sctp sctp: fix append error cause to ERROR chunk correctly 2010-07-05 11:11:21 -07:00
sunrpc SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir() 2010-08-02 10:20:45 -07:00
tipc tipc: Fix oops on send prior to entering networked mode (v3) 2010-07-05 11:11:16 -07:00
unix AF_UNIX: Fix deadlock on connecting to shutdown socket 2009-10-18 23:17:37 -07:00
wanrouter headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
wimax wimax: fix warning caused by not checking retval of rfkill_set_hw_state() 2009-06-11 11:12:48 -07:00
wireless wireless: report reasonable bitrate for MCS rates through wext 2010-07-05 11:11:11 -07:00
x25 net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
xfrm net: file_operations should be const 2009-09-02 01:03:53 -07:00
compat.c net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
Kconfig net/compat/wext: send different messages to compat tasks 2009-07-15 08:53:39 -07:00
Makefile net: remove redundant sched/ in net/Makefile 2009-07-12 20:11:14 -07:00
nonet.c
socket.c net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
sysctl_net.c
TUNABLE