Commit Graph

40398 Commits

Author SHA1 Message Date
Linus Torvalds
941f699544 Revert "af_unix: Fix splice-bind deadlock"
commit 38f7bd94a9 upstream.

This reverts commit c845acb324.

It turns out that it just replaces one deadlock with another one: we can
still get the wrong lock ordering with the readlock due to overlayfs
calling back into the filesystem layer and still taking the vfs locks
after the readlock.

The proper solution ends up being to just split the readlock into two
pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
(taken *inside* the filesystem locks).  The two locks are independent
anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
Mahesh Bandewar
f357a79839 bonding: Fix bonding crash
[ Upstream commit 24b27fc4cd ]

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
Arnd Bergmann
bc43ac8bd6 net: caif: fix misleading indentation
commit 8e0cc8c326 upstream.

gcc points out code that is not indented the way it is
interpreted:

net/caif/cfpkt_skbuff.c: In function 'cfpkt_setlen':
net/caif/cfpkt_skbuff.c:289:4: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
    return cfpkt_getlen(pkt);
    ^~~~~~
net/caif/cfpkt_skbuff.c:286:3: note: ...this 'else' clause, but it is not
   else
   ^~~~

It is clear from the context that not returning here would be
a bug, as we'd end up passing a negative length into a function
that takes a u16 length, so it is not missing curly braces
here, and I'm assuming that the indentation is the only part
that's wrong about it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:35 +02:00
Eric Dumazet
c867a11289 tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
[ Upstream commit 20a2b49fc5 ]

When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
 Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
+0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>

+0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30 10:18:34 +02:00
Artem Germanov
98418550e1 tcp: cwnd does not increase in TCP YeAH
[ Upstream commit db7196a0d0 ]

Commit 76174004a0
(tcp: do not slow start when cwnd equals ssthresh )
introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth
~100KB/s.
   That is caused by stalled cwnd when cwnd equals ssthresh. This patch
fixes it by proper increasing cwnd in this case.

Signed-off-by: Artem Germanov <agermanov@anchorfree.com>
Acked-by: Dmitry Adamushko <d.adamushko@anchorfree.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30 10:18:34 +02:00
Dave Jones
ea7dd213c1 ipv6: release dst in ping_v6_sendmsg
[ Upstream commit 03c2778a93 ]

Neither the failure or success paths of ping_v6_sendmsg release
the dst it acquires.  This leads to a flood of warnings from
"net/core/dst.c:288 dst_release" on older kernels that
don't have 8bf4ada2e2 backported.

That patch optimistically hoped this had been fixed post 3.10, but
it seems at least one case wasn't, where I've seen this triggered
a lot from machines doing unprivileged icmp sockets.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30 10:18:34 +02:00
David Forster
6b8076b8a7 ipv4: panic in leaf_walk_rcu due to stale node pointer
[ Upstream commit 94d9f1c590 ]

Panic occurs when issuing "cat /proc/net/route" whilst
populating FIB with > 1M routes.

Use of cached node pointer in fib_route_get_idx is unsafe.

 BUG: unable to handle kernel paging request at ffffc90001630024
 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
 Oops: 0000 [#1] SMP
 Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
 acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
 CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
 RIP: 0010:[<ffffffff814cf6a0>]  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 RSP: 0018:ffff88011a05fda0  EFLAGS: 00010202
 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
 FS:  00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
 Stack:
  ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
  ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
  ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
 Call Trace:
  [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0
  [<ffffffff811dd8b9>] ? seq_read+0x149/0x380
  [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500
  [<ffffffff8138dce0>] ? process_echoes+0x70/0x70
  [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70
  [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0
  [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0
  [<ffffffff811bbe61>] ? vfs_read+0x81/0x120
  [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0
  [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75
 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
 RIP  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
  RSP <ffff88011a05fda0>
 CR2: ffffc90001630024

Signed-off-by: Dave Forster <dforster@brocade.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30 10:18:34 +02:00
Wei Yongjun
8b18e0e498 ipv6: addrconf: fix dev refcont leak when DAD failed
commit 751eb6b604 upstream.

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
  -> addrconf_dad_failure        <- missing ifp put
     -> addrconf_mod_dad_work
       -> schedule addrconf_dad_work()
         -> addrconf_dad_stop()  <- missing ifp hold before call it

addrconf_dad_failure() called with ifp refcont holding but not put.
addrconf_dad_work() call addrconf_dad_stop() without extra holding
refcount. This will not cause any issue normally.

But the race between addrconf_dad_failure() and addrconf_dad_work()
may cause ifp refcount leak and netdevice can not be unregister,
dmesg show the following messages:

IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
...
unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Fixes: c15b1ccadb ("ipv6: move DAD and addrconf_verify processing
to workqueue")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:42 +02:00
Johannes Berg
7f71eec92e Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"
commit 4d0bd46a4d upstream.

This reverts commit 3d5fdff46c.

Ben Hutchings pointed out that the commit isn't safe since it assumes
that the structure used by the driver is iw_point, when in fact there's
no way to know about that.

Fortunately, the only driver in the tree that ever runs this code path
is the wilc1000 staging driver, so it doesn't really matter.

Clearly I should have investigated this better before applying, sorry.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 3d5fdff46c ("wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:41 +02:00
Jon Paul Maloy
e70bb22b8f tipc: move linearization of buffers to generic code
commit c7cad0d6f7 upstream.

In commit 5cbb28a4bf ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.

We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24 10:07:35 +02:00
Kangjie Lu
ffd5ce2ad5 rds: fix an infoleak in rds_inc_info_copy
commit 4116def233 upstream.

The last field "flags" of object "minfo" is not initialized.
Copying this object out may leak kernel stack data.
Assign 0 to it to avoid leak.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:51 +02:00
Tyler Hicks
6af4737361 net: Use ns_capable_noaudit() when determining net sysctl permissions
commit d6e0d30644 upstream.

The capability check should not be audited since it is only being used
to determine the inode permissions. A failed check does not indicate a
violation of security policy but, when an LSM is enabled, a denial audit
message was being generated.

The denial audit message caused confusion for some application authors
because root-running Go applications always triggered the denial. To
prevent this confusion, the capability check in net_ctl_permissions() is
switched to the noaudit variant.

BugLink: https://launchpad.net/bugs/1465724

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:50 +02:00
Florian Westphal
61610c63fa netfilter: x_tables: check for size overflow
[ Upstream commit d157bd7615 ]

Ben Hawkes says:
 integer overflow in xt_alloc_table_info, which on 32-bit systems can
 lead to small structure allocation and a copy_from_user based heap
 corruption.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:50 +02:00
Eric Dumazet
dfe2042d96 udp: properly support MSG_PEEK with truncated buffers
[ Upstream commit 197c949e77 ]

Backport of this upstream commit into stable kernels :
89c22d8c3b ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Richard Alpe
8b2e345ed5 tipc: fix nl compat regression for link statistics
[ Upstream commit 55e77a3e82 ]

Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes
of the link name where left out.

Making the output of tipc-config -ls look something like:
Link statistics:
dcast-link
1:data0-1.1.2:data0
1:data0-1.1.3:data0

Also, for the record, the patch that introduce this regression
claims "Sending the whole object out can cause a leak". Which isn't
very likely as this is a compat layer, where the data we are parsing
is generated by us and we know the string to be NULL terminated. But
you can of course never be to secure.

Fixes: 5d2be1422e (tipc: fix an infoleak in tipc_nl_compat_link_dump)
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Kangjie Lu
5fb7161192 tipc: fix an infoleak in tipc_nl_compat_link_dump
[ Upstream commit 5d2be1422e ]

link_info.str is a char array of size 60. Memory after the NULL
byte is not initialized. Sending the whole object out can cause
a leak.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Florian Westphal
cfd6e7fe43 netfilter: x_tables: check for size overflow
[ Upstream commit d157bd7615 ]

Ben Hawkes says:
 integer overflow in xt_alloc_table_info, which on 32-bit systems can
 lead to small structure allocation and a copy_from_user based heap
 corruption.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Parthasarathy Bhuvaragan
1b82da380d tipc: fix nullptr crash during subscription cancel
[ Upstream commit 4de13d7ed6 ]

commit 4d5cfcba2f ('tipc: fix connection abort during subscription
cancel'), removes the check for a valid subscription before calling
tipc_nametbl_subscribe().

This will lead to a nullptr exception when we process a
subscription cancel request. For a cancel request, a null
subscription is passed to tipc_nametbl_subscribe() resulting
in exception.

In this commit, we call tipc_nametbl_subscribe() only for
a valid subscription.

Fixes: 4d5cfcba2f ('tipc: fix connection abort during subscription cancel')
Reported-by: Anders Widell <anders.widell@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:44 +02:00
Felix Fietkau
ed6625cfdb mac80211: fix purging multicast PS buffer queue
commit 6b07d9ca9b upstream.

The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:41 +02:00
Olga Kornievskaia
94e88c12e0 SUNRPC: allow for upcalls for same uid but different gss service
commit 9130b8dbc6 upstream.

It's possible to have simultaneous upcalls for the same UIDs but
different GSS service. In that case, we need to allow for the
upcall to gssd to proceed so that not the same context is used
by two different GSS services. Some servers lock the use of context
to the GSS service.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:36 +02:00
Trond Myklebust
e3f2840dba SUNRPC: Handle EADDRNOTAVAIL on connection failures
commit 1f4c17a03b upstream.

If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
that means our choice of source port number was bad.
This error is expected when we set the SO_REUSEPORT socket option and we
have 2 sockets sharing the same source and destination address and port
combinations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 402e23b4ed ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:36 +02:00
Paul Moore
4a3e88a02e netlabel: add address family checks to netlbl_{sock,req}_delattr()
commit 0e0e367740 upstream.

It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.

Reported-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:22 +02:00
Amadeusz Sławiński
5c93b99d27 Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU
commit 23bc6ab0a0 upstream.

When we retrieve imtu value from userspace we should use 16 bit pointer
cast instead of 32 as it's defined that way in headers. Fixes setsockopt
calls on big-endian platforms.

Signed-off-by: Amadeusz Sławiński <amadeusz.slawinski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:19 +02:00
Soheil Hassas Yeganeh
9c946c931b tcp: consider recv buf for the initial window scale
[ Upstream commit f626300a3e ]

tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().

To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.

Fixes: b0573dea1f ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:48 +02:00
Vegard Nossum
fc9b7c086b net/irda: fix NULL pointer dereference on memory allocation failure
[ Upstream commit d3e6952cfb ]

I ran into this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
    RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
    RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
    RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
    RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
    R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
    FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
    Stack:
     0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
     ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
     ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
    Call Trace:
     [<ffffffff82bca542>] irda_connect+0x562/0x1190
     [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
     [<ffffffff825b4489>] SyS_connect+0x9/0x10
     [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
     [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
    RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
     RSP <ffff880111747bb8>
    ---[ end trace 4cda2588bc055b30 ]---

The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:48 +02:00
Julian Anastasov
a9c2218596 ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
[ Upstream commit 80610229ef ]

Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:

Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
 626238f0 960226a02 00000400 000000fe
 62623910 600afca7 62623970 62623a48
 62468038 00000018 00000000 00000000
Call Trace:
 [<602b3e93>] rtmsg_fib+0xd3/0x190
 [<602b6680>] fib_table_insert+0x260/0x500
 [<602b0e5d>] inet_rtm_newroute+0x4d/0x60
 [<60250def>] rtnetlink_rcv_msg+0x8f/0x270
 [<60267079>] netlink_rcv_skb+0xc9/0xe0
 [<60250d4b>] rtnetlink_rcv+0x3b/0x50
 [<60265400>] netlink_unicast+0x1a0/0x2c0
 [<60265e47>] netlink_sendmsg+0x3f7/0x470
 [<6021dc9a>] sock_sendmsg+0x3a/0x90
 [<6021e0d0>] ___sys_sendmsg+0x300/0x360
 [<6021fa64>] __sys_sendmsg+0x54/0xa0
 [<6021fac0>] SyS_sendmsg+0x10/0x20
 [<6001ea68>] handle_syscall+0x88/0x90
 [<600295fd>] userspace+0x3fd/0x500
 [<6001ac55>] fork_handler+0x85/0x90

$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264

Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.

Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Fixes: 0eeb075fad ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:47 +02:00
Jason Baron
5413f1a526 tcp: enable per-socket rate limiting of all 'challenge acks'
[ Upstream commit 083ae30828 ]

The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:

commit f2b2c582e8 ("tcp: mitigate ACK loops for connections as tcp_sock")

And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.

Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.

It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable

Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:47 +02:00
Eric Dumazet
72c2d3bcca tcp: make challenge acks less predictable
[ Upstream commit 75ff39ccc1 ]

Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.

This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.

Based on initial analysis and patch from Linus.

Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.

v2: randomize the count of challenge acks per second, not the period.

Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:47 +02:00
Ilya Dryomov
032951d32c libceph: apply new_state before new_up_client on incrementals
commit 930c532869 upstream.

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:29 +02:00
Vegard Nossum
b3a061d1d8 RDS: fix rds_tcp_init() error path
commit 3dad5424ad upstream.

If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().

Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:29 +02:00
Miklos Szeredi
0da3127a76 af_unix: fix hard linked sockets on overlay
commit eb0a4a47ae upstream.

Overlayfs uses separate inodes even in the case of hard links on the
underlying filesystems.  This is a problem for AF_UNIX socket
implementation which indexes sockets based on the inode.  This resulted in
hard linked sockets not working.

The fix is to use the real, underlying inode.

Test case follows:

-- ovl-sock-test.c --
#include <unistd.h>
#include <err.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCK "test-sock"
#define SOCK2 "test-sock2"

int main(void)
{
	int fd, fd2;
	struct sockaddr_un addr = {
		.sun_family = AF_UNIX,
		.sun_path = SOCK,
	};
	struct sockaddr_un addr2 = {
		.sun_family = AF_UNIX,
		.sun_path = SOCK2,
	};

	unlink(SOCK);
	unlink(SOCK2);
	if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
		err(1, "socket");
	if (bind(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1)
		err(1, "bind");
	if (listen(fd, 0) == -1)
		err(1, "listen");
	if (link(SOCK, SOCK2) == -1)
		err(1, "link");
	if ((fd2 = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
		err(1, "socket");
	if (connect(fd2, (struct sockaddr *) &addr2, sizeof(addr2)) == -1)
		err (1, "connect");
	return 0;
}
----

Reported-by: Alexander Morozov <alexandr.morozov@docker.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:33 -07:00
Martin KaFai Lau
9c458a8668 ipv6: Fix mem leak in rt6i_pcpu
[ Upstream commit 903ce4abdf ]

It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it.  We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy <pety@rusnet.ru>
Tested-by: Petr Novopashenniy <pety@rusnet.ru>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Petr Novopashenniy <pety@rusnet.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:31 -07:00
WANG Cong
2832302fc9 net_sched: fix mirrored packets checksum
[ Upstream commit 82a31b9231 ]

Similar to commit 9b368814b3 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:31 -07:00
David S. Miller
424848bd98 packet: Use symmetric hash for PACKET_FANOUT_HASH.
[ Upstream commit eb70db8756 ]

People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.

The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.

But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.

Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports.  This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.

Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:31 -07:00
J. Bruce Fields
58e9e70ab9 nfsd4/rpc: move backchannel create logic into rpc code
commit d50039ea5e upstream.

Also simplify the logic a bit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:30 -07:00
Jouni Malinen
96d50acbd4 mac80211: Fix mesh estab_plinks counting in STA removal case
commit 126e755732 upstream.

If a user space program (e.g., wpa_supplicant) deletes a STA entry that
is currently in NL80211_PLINK_ESTAB state, the number of established
plinks counter was not decremented and this could result in rejecting
new plink establishment before really hitting the real maximum plink
limit. For !user_mpm case, this decrementation is handled by
mesh_plink_deactive().

Fix this by decrementing estab_plinks on STA deletion
(mesh_sta_cleanup() gets called from there) so that the counter has a
correct value and the Beacon frame advertisement in Mesh Configuration
element shows the proper value for capability to accept additional
peers.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:27 -07:00
Bob Copeland
7b90e041d1 mac80211: mesh: flush mesh paths unconditionally
commit fe7a7c5762 upstream.

Currently, the mesh paths associated with a nexthop station are cleaned
up in the following code path:

    __sta_info_destroy_part1
    synchronize_net()
    __sta_info_destroy_part2
     -> cleanup_single_sta
       -> mesh_sta_cleanup
         -> mesh_plink_deactivate
           -> mesh_path_flush_by_nexthop

However, there are a couple of problems here:

1) the paths aren't flushed at all if the MPM is running in userspace
   (e.g. when using wpa_supplicant or authsae)

2) there is no synchronize_rcu between removing the path and readers
   accessing the nexthop, which means the following race is possible:

CPU0                            CPU1
~~~~                            ~~~~
                                sta_info_destroy_part1()
                                synchronize_net()
rcu_read_lock()
mesh_nexthop_resolve()
  mpath = mesh_path_lookup()
                                [...] -> mesh_path_flush_by_nexthop()
  sta = rcu_dereference(
    mpath->next_hop)
                                kfree(sta)
  access sta <-- CRASH

Fix both of these by unconditionally flushing paths before destroying
the sta, and by adding a synchronize_net() after path flush to ensure
no active readers can still dereference the sta.

Fixes this crash:

[  348.529295] BUG: unable to handle kernel paging request at 00020040
[  348.530014] IP: [<f929245d>] ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211]
[  348.530014] *pde = 00000000
[  348.530014] Oops: 0000 [#1] PREEMPT
[  348.530014] Modules linked in: drbg ansi_cprng ctr ccm ppp_generic slhc ipt_MASQUERADE nf_nat_masquerade_ipv4 8021q ]
[  348.530014] CPU: 0 PID: 20597 Comm: wget Tainted: G           O 4.6.0-rc5-wt=V1 #1
[  348.530014] Hardware name: To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080016  11/07/2014
[  348.530014] task: f64fa280 ti: f4f9c000 task.ti: f4f9c000
[  348.530014] EIP: 0060:[<f929245d>] EFLAGS: 00010246 CPU: 0
[  348.530014] EIP is at ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211]
[  348.530014] EAX: f4ce63e0 EBX: 00000088 ECX: f3788416 EDX: 00020008
[  348.530014] ESI: 00000000 EDI: 00000088 EBP: f6409a4c ESP: f6409a40
[  348.530014]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  348.530014] CR0: 80050033 CR2: 00020040 CR3: 33190000 CR4: 00000690
[  348.530014] Stack:
[  348.530014]  00000000 f4ce63e0 f5f9bd80 f6409a64 f9291d80 0000ce67 f5d51e00 f4ce63e0
[  348.530014]  f3788416 f6409a80 f9291dc1 f4ce8320 f4ce63e0 f5d51e00 f4ce63e0 f4ce8320
[  348.530014]  f6409a98 f9277f6f 00000000 00000000 0000007c 00000000 f6409b2c f9278dd1
[  348.530014] Call Trace:
[  348.530014]  [<f9291d80>] mesh_nexthop_lookup+0xbb/0xc8 [mac80211]
[  348.530014]  [<f9291dc1>] mesh_nexthop_resolve+0x34/0xd8 [mac80211]
[  348.530014]  [<f9277f6f>] ieee80211_xmit+0x92/0xc1 [mac80211]
[  348.530014]  [<f9278dd1>] __ieee80211_subif_start_xmit+0x807/0x83c [mac80211]
[  348.530014]  [<c04df012>] ? sch_direct_xmit+0xd7/0x1b3
[  348.530014]  [<c022a8c6>] ? __local_bh_enable_ip+0x5d/0x7b
[  348.530014]  [<f956870c>] ? nf_nat_ipv4_out+0x4c/0xd0 [nf_nat_ipv4]
[  348.530014]  [<f957e036>] ? iptable_nat_ipv4_fn+0xf/0xf [iptable_nat]
[  348.530014]  [<c04c6f45>] ? netif_skb_features+0x14d/0x30a
[  348.530014]  [<f9278e10>] ieee80211_subif_start_xmit+0xa/0xe [mac80211]
[  348.530014]  [<c04c769c>] dev_hard_start_xmit+0x1f8/0x267
[  348.530014]  [<c04c7261>] ?  validate_xmit_skb.isra.120.part.121+0x10/0x253
[  348.530014]  [<c04defc6>] sch_direct_xmit+0x8b/0x1b3
[  348.530014]  [<c04c7a9c>] __dev_queue_xmit+0x2c8/0x513
[  348.530014]  [<c04c7cfb>] dev_queue_xmit+0xa/0xc
[  348.530014]  [<f91bfc7a>] batadv_send_skb_packet+0xd6/0xec [batman_adv]
[  348.530014]  [<f91bfdc4>] batadv_send_unicast_skb+0x15/0x4a [batman_adv]
[  348.530014]  [<f91b5938>] batadv_dat_send_data+0x27e/0x310 [batman_adv]
[  348.530014]  [<f91c30b5>] ? batadv_tt_global_hash_find.isra.11+0x8/0xa [batman_adv]
[  348.530014]  [<f91b63f3>] batadv_dat_snoop_outgoing_arp_request+0x208/0x23d [batman_adv]
[  348.530014]  [<f91c0cd9>] batadv_interface_tx+0x206/0x385 [batman_adv]
[  348.530014]  [<c04c769c>] dev_hard_start_xmit+0x1f8/0x267
[  348.530014]  [<c04c7261>] ?  validate_xmit_skb.isra.120.part.121+0x10/0x253
[  348.530014]  [<c04defc6>] sch_direct_xmit+0x8b/0x1b3
[  348.530014]  [<c04c7a9c>] __dev_queue_xmit+0x2c8/0x513
[  348.530014]  [<f80cbd2a>] ? igb_xmit_frame+0x57/0x72 [igb]
[  348.530014]  [<c04c7cfb>] dev_queue_xmit+0xa/0xc
[  348.530014]  [<f843a326>] br_dev_queue_push_xmit+0xeb/0xfb [bridge]
[  348.530014]  [<f843a35f>] br_forward_finish+0x29/0x74 [bridge]
[  348.530014]  [<f843a23b>] ? deliver_clone+0x3b/0x3b [bridge]
[  348.530014]  [<f843a714>] __br_forward+0x89/0xe7 [bridge]
[  348.530014]  [<f843a336>] ? br_dev_queue_push_xmit+0xfb/0xfb [bridge]
[  348.530014]  [<f843a234>] deliver_clone+0x34/0x3b [bridge]
[  348.530014]  [<f843a68b>] ? br_flood+0x95/0x95 [bridge]
[  348.530014]  [<f843a66d>] br_flood+0x77/0x95 [bridge]
[  348.530014]  [<f843a809>] br_flood_forward+0x13/0x1a [bridge]
[  348.530014]  [<f843a68b>] ? br_flood+0x95/0x95 [bridge]
[  348.530014]  [<f843b877>] br_handle_frame_finish+0x392/0x3db [bridge]
[  348.530014]  [<c04e9b2b>] ? nf_iterate+0x2b/0x6b
[  348.530014]  [<f843baa6>] br_handle_frame+0x1e6/0x240 [bridge]
[  348.530014]  [<f843b4e5>] ? br_handle_local_finish+0x6a/0x6a [bridge]
[  348.530014]  [<c04c4ba0>] __netif_receive_skb_core+0x43a/0x66b
[  348.530014]  [<f843b8c0>] ? br_handle_frame_finish+0x3db/0x3db [bridge]
[  348.530014]  [<c023cea4>] ? resched_curr+0x19/0x37
[  348.530014]  [<c0240707>] ? check_preempt_wakeup+0xbf/0xfe
[  348.530014]  [<c0255dec>] ? ktime_get_with_offset+0x5c/0xfc
[  348.530014]  [<c04c4fc1>] __netif_receive_skb+0x47/0x55
[  348.530014]  [<c04c57ba>] netif_receive_skb_internal+0x40/0x5a
[  348.530014]  [<c04c61ef>] napi_gro_receive+0x3a/0x94
[  348.530014]  [<f80ce8d5>] igb_poll+0x6fd/0x9ad [igb]
[  348.530014]  [<c0242bd8>] ? swake_up_locked+0x14/0x26
[  348.530014]  [<c04c5d29>] net_rx_action+0xde/0x250
[  348.530014]  [<c022a743>] __do_softirq+0x8a/0x163
[  348.530014]  [<c022a6b9>] ? __hrtimer_tasklet_trampoline+0x19/0x19
[  348.530014]  [<c021100f>] do_softirq_own_stack+0x26/0x2c
[  348.530014]  <IRQ>
[  348.530014]  [<c022a957>] irq_exit+0x31/0x6f
[  348.530014]  [<c0210eb2>] do_IRQ+0x8d/0xa0
[  348.530014]  [<c058152c>] common_interrupt+0x2c/0x40
[  348.530014] Code: e7 8c 00 66 81 ff 88 00 75 12 85 d2 75 0e b2 c3 b8 83 e9 29 f9 e8 a7 5f f9 c6 eb 74 66 81 e3 8c 005
[  348.530014] EIP: [<f929245d>] ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211] SS:ESP 0068:f6409a40
[  348.530014] CR2: 0000000000020040
[  348.530014] ---[ end trace 48556ac26779732e ]---
[  348.530014] Kernel panic - not syncing: Fatal exception in interrupt
[  348.530014] Kernel Offset: disabled

Reported-by: Fred Veldini <fred.veldini@gmail.com>
Tested-by: Fred Veldini <fred.veldini@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:27 -07:00
Felix Fietkau
9dcbb4d9fd mac80211: fix fast_tx header alignment
commit 6fe04128f1 upstream.

The header field is defined as u8[] but also accessed as struct
ieee80211_hdr. Enforce an alignment of 2 to prevent unnecessary
unaligned accesses, which can be very harmful for performance on many
platforms.

Fixes: e495c24731 ("mac80211: extend fast-xmit for more ciphers")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:27 -07:00
Basil Gunn
eda8951310 AX.25: Close socket connection on session completion
[ Upstream commit 4a7d99ea1b ]

A socket connection made in ax.25 is not closed when session is
completed.  The heartbeat timer is stopped prematurely and this is
where the socket gets closed. Allow heatbeat timer to run to close
socket. Symptom occurs in kernels >= 4.2.0

Originally sent 6/15/2016. Resend with distribution list matching
scripts/maintainer.pl output.

Signed-off-by: Basil Gunn <basil@pacabunga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:12 -07:00
Daniel Borkmann
b30cc5b14f bpf: try harder on clones when writing into skb
[ Upstream commit 3697649ff2 ]

When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().

Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:12 -07:00
David Barroso
f4de1e7458 neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
[ Upstream commit b560f03ddf ]

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e8 ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarroso@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuytenhek@fastly.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:12 -07:00
daniel
688feac489 Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
[ Upstream commit 0888d5f3c0 ]

The bridge is falsly dropping ipv6 mulitcast packets if there is:
 1. No ipv6 address assigned on the brigde.
 2. No external mld querier present.
 3. The internal querier enabled.

When the bridge fails to build mld queries, because it has no
ipv6 address, it slilently returns, but keeps the local querier enabled.
This specific case causes confusing packet loss.

Ipv6 multicast snooping can only work if:
 a) An external querier is present
 OR
 b) The bridge has an ipv6 address an is capable of sending own queries

Otherwise it has to forward/flood the ipv6 multicast traffic,
because snooping cannot work.

This patch fixes the issue by adding a flag to the bridge struct that
indicates that there is currently no ipv6 address assinged to the bridge
and returns a false state for the local querier in
__br_multicast_querier_exists().

Special thanks to Linus Lüssing.

Fixes: d1d81d4c3d ("bridge: check return value of ipv6_dev_get_saddr()")
Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Tom Goff
d05916086f ipmr/ip6mr: Initialize the last assert time of mfc entries.
[ Upstream commit 70a0dec451 ]

This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.

Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Eric Dumazet
6fb6914d5b netem: fix a use after free
[ Upstream commit 21de12ee55 ]

If the packet was dropped by lower qdisc, then we must not
access it later.

Save qdisc_pkt_len(skb) in a temp variable.

Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Herbert Xu
da7da39bdf esp: Fix ESN generation under UDP encapsulation
[ Upstream commit 962fcef33b ]

Blair Steven noticed that ESN in conjunction with UDP encapsulation
is broken because we set the temporary ESP header to the wrong spot.

This patch fixes this by first of all using the right spot, i.e.,
4 bytes off the real ESP header, and then saving this information
so that after encryption we can restore it properly.

Fixes: 7021b2e1cd ("esp4: Switch to new AEAD interface")
Reported-by: Blair Steven <Blair.Steven@alliedtelesis.co.nz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Simon Horman
9d6814d7ae sit: correct IP protocol used in ipip6_err
[ Upstream commit d5d8760b78 ]

Since 32b8a8e59c ("sit: add IPv4 over IPv4 support")
ipip6_err() may be called for packets whose IP protocol is
IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.

In the case of IPPROTO_IPIP packets the correct protocol value is not
passed to ipv4_update_pmtu() or ipv4_redirect().

This patch resolves this problem by using the IP protocol of the packet
rather than a hard-coded value. This appears to be consistent
with the usage of the protocol of a packet by icmp_socket_deliver()
the caller of ipip6_err().

I was able to exercise the redirect case by using a setup where an ICMP
redirect was received for the destination of the encapsulated packet.
However, it appears that although incorrect the protocol field is not used
in this case and thus no problem manifests.  On inspection it does not
appear that a problem will manifest in the fragmentation needed/update pmtu
case either.

In short I believe this is a cosmetic fix. None the less, the use of
IPPROTO_IPV6 seems wrong and confusing.

Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Eric Dumazet
95e727af08 net_sched: fix pfifo_head_drop behavior vs backlog
[ Upstream commit 6c0d54f189 ]

When the qdisc is full, we drop a packet at the head of the queue,
queue the current skb and return NET_XMIT_CN

Now we track backlog on upper qdiscs, we need to call
qdisc_tree_reduce_backlog(), even if the qlen did not change.

Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11 09:31:11 -07:00
Florian Westphal
e917563612 netfilter: x_tables: introduce and use xt_copy_counters_from_user
commit d7591f0c41 upstream.

The three variants use same copy&pasted code, condense this into a
helper and use that.

Make sure info.name is 0-terminated.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:24 -07:00
Florian Westphal
d69f93d059 netfilter: x_tables: do compat validation via translate_table
commit 09d9686047 upstream.

This looks like refactoring, but its also a bug fix.

Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.

For example, we do not check for underflows and the base chain policies.

While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.

Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.

At a high level 32 bit compat works like this:
1- initial pass over blob:
   validate match/entry offsets, bounds checking
   lookup all matches and targets
   do bookkeeping wrt. size delta of 32/64bit structures
   assign match/target.u.kernel pointer (points at kernel
   implementation, needed to access ->compatsize etc.)

2- allocate memory according to the total bookkeeping size to
   contain the translated ruleset

3- second pass over original blob:
   for each entry, copy the 32bit representation to the newly allocated
   memory.  This also does any special match translations (e.g.
   adjust 32bit to 64bit longs, etc).

4- check if ruleset is free of loops (chase all jumps)

5-first pass over translated blob:
   call the checkentry function of all matches and targets.

The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.

In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .

This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.

This has two drawbacks:

1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.

THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.

iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003

shows no noticeable differences in restore times:
old:   0m30.796s
new:   0m31.521s
64bit: 0m25.674s

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:24 -07:00
Florian Westphal
3a69c0f048 netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
commit 0188346f21 upstream.

Always returned 0.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:23 -07:00