[ Upstream commit 348a400272 ]
In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.
Here is the syzkaller reported general protection fault in fib6_locate:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
Call Trace:
[<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
[<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
[<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
[<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
[<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
[<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
[<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
[<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
[<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
[<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
[<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
[<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
[<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
[<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
[<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
[<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
[<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
Note: there is no "Fixes" tag as this seems to be a bug introduced
very early.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 383143f31d ]
syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().
Fixes: 2759647247 ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5bfd37b4de ]
syszkaller reported use-after-free in tipc [1]
When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.
[1]
==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
skb_push+0xd4/0xe0 net/core/skbuff.c:1466
tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
Allocated by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
__alloc_skb+0xf1/0x740 net/core/skbuff.c:219
alloc_skb include/linux/skbuff.h:903 [inline]
tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3763
kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
__kfree_skb net/core/skbuff.c:682 [inline]
kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
The buggy address belongs to the object at ffff8801c6e71dc0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 208 bytes inside of
224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
The buggy address belongs to the page:
page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c780a049f9 ]
While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.
gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/
This is a problem because device mtu can be changed on other cpus or
threads.
While this patch does not fix the issue I am working on, it is
probably worth addressing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 325d5dc3f7 ]
When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 187e5b3ac8 ]
If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :
m = fi->fib_metrics;
if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
kfree(m);
Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.
Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.
This was reported by syzkaller team.
Fixes: 3fb07daff8 ("ipv4: add reference counting to metrics")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c3f379492 upstream.
If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.
But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.
For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.
For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.
I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)
Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
# iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
Also note, in the userspace we only add a 20s' delay, then
reinject the syn packet to the kernel:
# iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
# iptables -F FORWARD
# rmmod ipt_SYNPROXY
# rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.
Now you will see the panic like this:
kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
Call Trace:
? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
init_conntrack+0x12b/0x600 [nf_conntrack]
nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
nf_reinject+0x104/0x270
nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
? nla_parse+0xa0/0x100
nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
[...]
One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.
But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.
Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5cb659bbc upstream.
Our customer encountered stuck NFS writes for blocks starting at specific
offsets w.r.t. page boundary caused by networking stack sending packets via
UFO enabled device with wrong checksum. The problem can be reproduced by
composing a long UDP datagram from multiple parts using MSG_MORE flag:
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 3000, 0, ...);
Assume this packet is to be routed via a device with MTU 1500 and
NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
this condition is tested (among others) to decide whether to call
ip_ufo_append_data():
((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
At the moment, we already have skb with 1028 bytes of data which is not
marked for GSO so that the test is false (fragheaderlen is usually 20).
Thus we append second 1000 bytes to this skb without invoking UFO. Third
sendto(), however, has sufficient length to trigger the UFO path so that we
end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
uses udp_csum() to calculate the checksum but that assumes all fragments
have correct checksum in skb->csum which is not true for UFO fragments.
When checking against MTU, we need to add skb->len to length of new segment
if we already have a partially filled skb and fragheaderlen only if there
isn't one.
In the IPv6 case, skb can only be null if this is the first segment so that
we have to use headersize (length of the first IPv6 header) rather than
fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
Fixes: e89e9cf539 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Fixes: e4c5e13aa4 ("ipv6: Should use consistent conditional judgement for
ip6 fragment between __ip6_append_data and ip6_finish_output")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a28cfd51e upstream.
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.
That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.
Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 85f1bd9a7b ]
When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.
Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.
Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.
A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.
Found by syzkaller.
Fixes: e89e9cf539 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit f102bb7164 which is
commit 0a28cfd51e upstream as there is
another patch that needs to be applied instead of this one.
Cc: Zheng Li <james.z.li@ericsson.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit ef09c9ff34 which is
commit a5cb659bbc upstream as it causes
merge issues with later patches that are much more important...
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c27927e372 ]
Updates to tp_reserve can race with reads of the field in
packet_set_ring. Avoid this by holding the socket lock during
updates in setsockopt PACKET_RESERVE.
This bug was discovered by syzkaller.
Fixes: 8913336a7e ("packet: add PACKET_RESERVE sockopt")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8d63bee643 ]
skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.
Commit b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.
When __skb_gso_segment is called from the receive path, this
triggers the warning again.
Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.
See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/
Fixes: b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8ba6092471 ]
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
to call tcp_connect() while socket sk_dst_cache is either NULL
or invalid.
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 connect(4, ..., ...) = 0
<< sk->sk_dst_cache becomes obsolete, or even set to NULL >>
+1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
We need to refresh the route otherwise bad things can happen,
especially when syzkaller is running on the host :/
Fixes: 19f6d3f3c8 ("net/tcp-fastopen: Add new API support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96d9703050 ]
Commit 55917a21d0 ("netfilter: x_tables: add context to know if
extension runs from nft_compat") introduced a member nft_compat to
xt_tgchk_param structure.
But it didn't set it's value for ipt_init_target. With unexpected
value in par.nft_compat, it may return unexpected result in some
target's checkentry.
This patch is to set all it's fields as 0 and only initialize the
non-zero fields in ipt_init_target.
v1->v2:
As Wang Cong's suggestion, fix it by setting all it's fields as
0 and only initializing the non-zero fields.
Fixes: 55917a21d0 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed254971ed ]
If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a5cb659bbc ]
Our customer encountered stuck NFS writes for blocks starting at specific
offsets w.r.t. page boundary caused by networking stack sending packets via
UFO enabled device with wrong checksum. The problem can be reproduced by
composing a long UDP datagram from multiple parts using MSG_MORE flag:
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 3000, 0, ...);
Assume this packet is to be routed via a device with MTU 1500 and
NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
this condition is tested (among others) to decide whether to call
ip_ufo_append_data():
((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
At the moment, we already have skb with 1028 bytes of data which is not
marked for GSO so that the test is false (fragheaderlen is usually 20).
Thus we append second 1000 bytes to this skb without invoking UFO. Third
sendto(), however, has sufficient length to trigger the UFO path so that we
end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
uses udp_csum() to calculate the checksum but that assumes all fragments
have correct checksum in skb->csum which is not true for UFO fragments.
When checking against MTU, we need to add skb->len to length of new segment
if we already have a partially filled skb and fragheaderlen only if there
isn't one.
In the IPv6 case, skb can only be null if this is the first segment so that
we have to use headersize (length of the first IPv6 header) rather than
fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
Fixes: e89e9cf539 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Fixes: e4c5e13aa4 ("ipv6: Should use consistent conditional judgement for
ip6 fragment between __ip6_append_data and ip6_finish_output")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a28cfd51e ]
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.
That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.
Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e90ce2fc27 ]
In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b7953d3c0e ]
The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.
This patch is to fix it for dccp_ipv4.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c2232b0a7 ]
In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.
The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.
Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.
dccp_memleak.c:
int s1 = socket(PF_INET6, 6, IPPROTO_IP);
bind(s1, &sa1, 0x20);
listen(s1, 0x9);
int s2 = socket(PF_INET6, 6, IPPROTO_IP);
connect(s2, &sa1, 0x20);
close(s1);
close(s2);
This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit afce615aaa ]
RFC 2465 defines ipv6IfStatsOutFragFails as:
"The number of IPv6 datagrams that have been discarded
because they needed to be fragmented at this output
interface but could not be."
The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.
This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.
Reported-by: Sabrina Dubroca <sdubroca@redhat.com>
Fixes: 1d325d217c ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c800aaf8d8 ]
There are multiple reports showing we have a use-after-free in
the timer prb_retire_rx_blk_timer_expired(), where we use struct
tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
free_pg_vec().
The interesting part is it is not freed via packet_release() but
via packet_setsockopt(), which means we are not closing the socket.
Looking into the big and fat function packet_set_ring(), this could
happen if we satisfy the following conditions:
1. closing == 0, not on packet_release() path
2. req->tp_block_nr == 0, we don't allocate a new pg_vec
3. rx_ring->pg_vec is already set as V3, which means we already called
packet_set_ring() wtih req->tp_block_nr > 0 previously
4. req->tp_frame_nr == 0, pass sanity check
5. po->mapped == 0, never called mmap()
In this scenario we are clearing the old rx_ring->pg_vec, so we need
to free this pg_vec, but we don't stop the timer on this path because
of closing==0.
The timer has to be stopped as long as we need to free pg_vec, therefore
the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.
Thanks to liujian for testing different fixes.
Reported-by: alexander.levin@verizon.com
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: liujian (CE) <liujian56@huawei.com>
Tested-by: liujian (CE) <liujian56@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69ec932e36 ]
Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 153711f942 ]
virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.
We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8799a221f5 ]
Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.
Fixes following crash
Call Trace:
? alternate_node_alloc+0x76/0xa0
fib_table_insert+0x1b7/0x4b0
fib_magic.isra.17+0xea/0x120
fib_add_ifaddr+0x7b/0x190
fib_netdev_event+0xc0/0x130
register_netdevice_notifier+0x1c1/0x1d0
ip_fib_init+0x72/0x85
ip_rt_init+0x187/0x1e9
ip_init+0xe/0x1a
inet_init+0x171/0x26c
? ipv4_offload_init+0x66/0x66
do_one_initcall+0x43/0x160
kernel_init_freeable+0x191/0x219
? rest_init+0x80/0x80
kernel_init+0xe/0x150
ret_from_fork+0x22/0x30
Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
CR2: 0000000000000014
Fixes: 7b1a74fdbb ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 7f9b80529b ("[IPV4]: fib hash|trie initialization")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6399f1fae4 ]
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
This problem has been here since before the beginning of git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 63679112c5 ]
The ifr.ifr_name is passed around and assumed to be NULL terminated.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c86d77743 upstream.
On IPv4-mapped IPv6 addresses sk_family is AF_INET6,
but the flow informations are created based on AF_INET.
So the routing set up 'struct flowi4' but we try to
access 'struct flowi6' what leads to an out of bounds
access. Fix this by using the family we get with the
dst_entry, like we do it for the standard policy lookup.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e4c5e13aa4 ]
There is an inconsistent conditional judgement between __ip6_append_data
and ip6_finish_output functions, the variable length in __ip6_append_data
just include the length of application's payload and udp6 header, don't
include the length of ipv6 header, but in ip6_finish_output use
(skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ipv6 header.
That causes some particular application's udp6 payloads whose length are
between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
though the rst->dev support UFO feature.
Add the length of ipv6 header to length in __ip6_append_data to keep
consistent conditional judgement as ip6_finish_output for ip6 fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e7bc478c9 upstream.
My recent change missed fact that UFO would perform a complete
UDP checksum before segmenting in frags.
In this case skb->ip_summed is set to CHECKSUM_NONE.
We need to add this valid case to skb_needs_check()
Fixes: b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 308453aa91 upstream.
The MAC address of the physical interface is only copied to the VLAN
when it is first created, resulting in an inconsistency after MAC
address changes of only newly created VLANs having an up-to-date MAC.
The VLANs should continue inheriting the MAC address of the physical
interface until the VLAN MAC address is explicitly set to any value.
This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
to the MAC of the physical interface and thus for DAD to behave as
expected.
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2504a5dbe upstream.
Dmitry reported warnings occurring in __skb_gso_segment() [1]
All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.
We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.
It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.
By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.
With help from Willem de Bruijn
[1]
WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
Call Trace:
[<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
[<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
[<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
[<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
[<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
[<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
[<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
[<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
[<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
[<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
[<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
[<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
[<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
[<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
[<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
[<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
[<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
[<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e357d83c upstream.
A dump may come in the middle of another dump, modifying its dump
structure members. This race condition will result in NULL pointer
dereference in kernel. So add a lock to prevent that race.
Fixes: 83321d6b98 ("[AF_KEY]: Dump SA/SP entries non-atomically")
Signed-off-by: Yuejie Shi <syjcnss@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c5ab3f395 upstream.
We do not check if packet from real server is for NAT
connection before performing SNAT. This causes problems
for setups that use DR/TUN and allow local clients to
access the real server directly, for example:
- local client in director creates IPVS-DR/TUN connection
CIP->VIP and the request packets are routed to RIP.
Talks are finished but IPVS connection is not expired yet.
- second local client creates non-IPVS connection CIP->RIP
with same reply tuple RIP->CIP and when replies are received
on LOCAL_IN we wrongly assign them for the first client
connection because RIP->CIP matches the reply direction.
As result, IPVS SNATs replies for non-IPVS connections.
The problem is more visible to local UDP clients but in rare
cases it can happen also for TCP or remote clients when the
real server sends the reply traffic via the director.
So, better to be more precise for the reply traffic.
As replies are not expected for DR/TUN connections, better
to not touch them.
Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk>
Tested-by: Nick Moriarty <nick.moriarty@york.ac.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 096f41d3a8 upstream.
The parsing of sadb_x_ipsecrequest is broken in a number of ways.
First of all we're not verifying sadb_x_ipsecrequest_len. This
is needed when the structure carries addresses at the end. Worse
we don't even look at the length when we parse those optional
addresses.
The migration code had similar parsing code that's better but
it also has some deficiencies. The length is overcounted first
of all as it includes the header itself. It also fails to check
the length before dereferencing the sa_family field.
This patch fixes those problems in parse_sockaddr_pair and then
uses it in parse_ipsecrequest.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 329d823098 upstream.
This file is filled with complex cryptography. Thus, the comparisons of
MACs and secret keys and curve points and so forth should not add timing
attacks, which could either result in a direct forgery, or, given the
complexity, some other type of attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6a5885fc4 upstream.
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() handlers of the
AF_NFC socket. Since the syscall doesn't enforce a minimum size of the
corresponding memory region, very short sockaddrs (zero or one byte long)
result in operating on uninitialized memory while referencing .sa_family.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 608c4adfca upstream.
Fix the sockaddr length verification in the connect() handler of NFC/LLCP
sockets, to compare against the size of the actual structure expected on
input (sockaddr_nfc_llcp) instead of its shorter version (sockaddr_nfc).
Both structures are defined in include/uapi/linux/nfc.h. The fields
specific to the _llcp extended struct are as follows:
276 __u8 dsap; /* Destination SAP, if known */
277 __u8 ssap; /* Source SAP to be bound to */
278 char service_name[NFC_LLCP_MAX_SERVICE_NAME]; /* Service name URI */;
279 size_t service_name_len;
If the caller doesn't provide a sufficiently long sockaddr buffer, these
fields remain uninitialized (and they currently originate from the stack
frame of the top-level sys_connect handler). They are then copied by
llcp_sock_connect() into internal storage (nfc_llcp_sock structure), and
could be subsequently read back through the user-mode getsockname()
function (handled by llcp_sock_getname()). This would result in the
disclosure of up to ~70 uninitialized bytes from the kernel stack to
user-mode clients capable of creating AFC_NFC sockets.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0323b979f upstream.
Check that the NFC_ATTR_TARGET_INDEX and NFC_ATTR_PROTOCOLS attributes (in
addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client
prior to accessing them. This prevents potential unhandled NULL pointer
dereference exceptions which can be triggered by malicious user-mode
programs, if they omit one or both of these attributes.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20777bc57c upstream.
Commit 7eda8b8e96 ("NFC: Use IDR library to assing NFC devices IDs")
moved device-id allocation and struct-device initialisation from
nfc_allocate_device() to nfc_register_device().
This broke just about every nfc-device-registration error path, which
continue to call nfc_free_device() that tries to put the device
reference of the now uninitialised (but zeroed) struct device:
kobject: '(null)' (ce316420): is not initialized, yet kobject_put() is being called.
The late struct-device initialisation also meant that various work
queues whose names are derived from the nfc device name were also
misnamed:
421 root 0 SW< [(null)_nci_cmd_]
422 root 0 SW< [(null)_nci_rx_w]
423 root 0 SW< [(null)_nci_tx_w]
Move the id-allocation and struct-device initialisation back to
nfc_allocate_device() and fix up the single call site which did not use
nfc_free_device() in its error path.
Fixes: 7eda8b8e96 ("NFC: Use IDR library to assing NFC devices IDs")
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9361df14d1 upstream.
nla policy checks for only maximum length of the attribute data
when the attribute type is NLA_BINARY. If userspace sends less
data than specified, the wireless drivers may access illegal
memory. When type is NLA_UNSPEC, nla policy check ensures that
userspace sends minimum specified length number of bytes.
Remove type assignment to NLA_BINARY from nla_policy of
NL80211_ATTR_PMKID to make this NLA_UNSPEC and to make sure minimum
WLAN_PMKID_LEN bytes are received from userspace with
NL80211_ATTR_PMKID.
Fixes: 67fbb16be6 ("nl80211: PMKSA caching support")
Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7f13f7450 upstream.
validate_scan_freqs() retrieves frequencies from attributes
nested in the attribute NL80211_ATTR_SCAN_FREQUENCIES with
nla_get_u32(), which reads 4 bytes from each attribute
without validating the size of data received. Attributes
nested in NL80211_ATTR_SCAN_FREQUENCIES don't have an nla policy.
Validate size of each attribute before parsing to avoid potential buffer
overread.
Fixes: 2a51931192 ("cfg80211/nl80211: scanning (and mac80211 update to use it)")
Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>