Commit Graph

301880 Commits

Author SHA1 Message Date
Alex Elder
ed35fbcd3c ceph: use info returned by get_authorizer
(cherry picked from commit 8f43fb5389)

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:08 -08:00
Alex Elder
4f33c7ed37 ceph: have get_authorizer methods return pointers
(cherry picked from commit a3530df33e)

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:07 -08:00
Alex Elder
83d28f7956 ceph: ensure auth ops are defined before use
(cherry picked from commit a255651d4c)

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:07 -08:00
Alex Elder
018a2a13f3 ceph: messenger: reduce args to create_authorizer
(cherry picked from commit 74f1869f76)

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:07 -08:00
Alex Elder
0f56a54fce ceph: define ceph_auth_handshake type
(cherry picked from commit 6c4a19158b)

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:07 -08:00
Alex Elder
33f0577a99 ceph: messenger: check return from get_authorizer
(cherry picked from commit ed96af6460)

In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value.  This function can
return an error, so check for it and return it if that ever occurs.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:06 -08:00
Alex Elder
29e1a95eb5 ceph: messenger: rework prepare_connect_authorizer()
(cherry picked from commit b1c6b9803f)

Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.

Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:06 -08:00
Alex Elder
59336e0846 ceph: messenger: check prepare_write_connect() result
(cherry picked from commit 5a0f8fdd8a)

prepare_write_connect() can return an error, but only one of its
callers checks for it.  All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:06 -08:00
Alex Elder
59a521ebc0 ceph: don't set WRITE_PENDING too early
(cherry picked from commit e10c758e40)

prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection.  Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending.  It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.

Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.

Partially addresses http://tracker.newdream.net/issues/2424

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:05 -08:00
Alex Elder
7dd07ab6bd ceph: drop msgr argument from prepare_write_connect()
(cherry picked from commit e825a66df9)

In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr.  Just get the msgr
value from the ceph connection and drop the unneeded argument.

The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:05 -08:00
Alex Elder
d4ac74c322 ceph: messenger: send banner in process_connect()
(cherry picked from commit 41b90c0085)

prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message.  It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:05 -08:00
Alex Elder
726644c635 ceph: messenger: reset connection kvec caller
(cherry picked from commit 84fb3adf64)

Reset a connection's kvec fields in the caller rather than in
prepare_write_connect().   This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:05 -08:00
Alex Elder
6b7cc02849 libceph: don't reset kvec in prepare_write_banner()
(cherry picked from commit d329156f16)

Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:04 -08:00
Alex Elder
1f7631fba2 ceph: messenger: change read_partial() to take "end" arg
(cherry picked from commit fd51653f78)

Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents.  This
amounts to moving the addition "to + size" into the caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:04 -08:00
Alex Elder
dd22ce515d ceph: messenger: update "to" in read_partial() caller
(cherry picked from commit e6cee71fac)

read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.

Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:04 -08:00
Alex Elder
d0d7af68e9 ceph: messenger: use read_partial() in read_partial_message()
(cherry picked from commit 57dac9d162)

There are two blocks of code in read_partial_message()--those that
read the header and footer of the message--that can be replaced by a
call to read_partial().  Do that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:04 -08:00
Alex Elder
f77637d9f6 ceph: osd_client: fix endianness bug in osd_req_encode_op()
(cherry picked from commit 065a68f916)

From Al Viro <viro@zeniv.linux.org.uk>

Al Viro noticed that we were using a non-cpu-encoded value in
a switch statement in osd_req_encode_op().  The result would
clearly not work correctly on a big-endian machine.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:04 -08:00
Sage Weil
cf34fc7d48 crush: fix memory leak when destroying tree buckets
(cherry picked from commit 6eb43f4b5a)

Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:03 -08:00
Sage Weil
9a0117ae53 crush: fix tree node weight lookup
(cherry picked from commit f671d4cd9b)

Fix the node weight lookup for tree buckets by using a correct accessor.

Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:03 -08:00
Sage Weil
20501b9e6e crush: be more tolerant of nonsensical crush maps
(cherry picked from commit a1f4895be8)

If we get a map that doesn't make sense, error out or ignore the badness
instead of BUGging out.  This reflects the ceph.git commits
9895f0bff7dc68e9b49b572613d242315fb11b6c and
8ded26472058d5205803f244c2f33cb6cb10de79.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:03 -08:00
Sage Weil
9926776ca5 crush: adjust local retry threshold
(cherry picked from commit c90f95ed46)

This small adjustment reflects a change that was made in ceph.git commit
af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago.  An N-1
search is not exhaustive.  Fixed ceph.git bug #1594.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:03 -08:00
Sage Weil
506b4672ac crush: clean up types, const-ness
(cherry picked from commit 8b12d47b80)

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:03 -08:00
Dave Jones
5564921186 selinux: fix sel_netnode_insert() suspicious rcu dereference
commit 88a693b5c1 upstream.

===============================
[ INFO: suspicious RCU usage. ]
3.5.0-rc1+ #63 Not tainted
-------------------------------
security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by trinity-child1/8750:
 #0:  (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0

stack backtrace:
Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63
Call Trace:
 [<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130
 [<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0
 [<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0
 [<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0
 [<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0
 [<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff812c9536>] security_socket_bind+0x16/0x20
 [<ffffffff815550ca>] sys_bind+0x7a/0x100
 [<ffffffff816c03d5>] ? sysret_check+0x22/0x5d
 [<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b

This patch below does what Paul McKenney suggested in the previous thread.

Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:02 -08:00
Jan Kara
a23d6310a6 reiserfs: Protect reiserfs_quota_write() with write lock
commit 361d94a338 upstream.

Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:02 -08:00
Jan Kara
8c7dcc4819 reiserfs: Move quota calls out of write lock
commit 7af1168693 upstream.

Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:02 -08:00
Jan Kara
8ed4d1ceb2 reiserfs: Protect reiserfs_quota_on() with write lock
commit b9e06ef2e8 upstream.

In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:02 -08:00
Jan Kara
394cbbc441 reiserfs: Fix lock ordering during remount
commit 3bb3e1fc47 upstream.

When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:01 -08:00
Bryan Schumaker
824904ce9e NFS: Wait for session recovery to finish before returning
commit 399f11c3d8 upstream.

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

	Client				Server
	------				------
	Open file over NFS v4.1
	Write to file
					Expire client
	Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:01 -08:00
Daniel Vetter
d1b28a26ed drm/i915: fix overlay on i830M
commit a9193983f4 upstream.

The overlay on the i830M has a peculiar failure mode: It works the
first time around after boot-up, but consistenly hangs the second time
it's used.

Chris Wilson has dug out a nice errata:

"1.5.12 Clock Gating Disable for Display Register
Address Offset:	06200h–06203h

"Bit 3
Ovrunit Clock Gating Disable.
0 = Clock gating controlled by unit enabling logic
1 = Disable clock gating function
DevALM Errata ALM049: Overlay Clock Gating Must be Disabled:  Overlay
& L2 Cache clock gating must be disabled in order to prevent device
hangs when turning off overlay.SW must turn off Ovrunit clock gating
(6200h) and L2 Cache clock gating (C8h)."

Now I've nowhere found that 0xc8 register and hence couldn't apply the
l2 cache workaround. But I've remembered that part of the magic that
the OVERLAY_ON/OFF commands are supposed to do is to rearrange cache
allocations so that the overlay scaler has some scratch space.

And while pondering how that could explain the hang the 2nd time we
enable the overlay, I've remembered that the old ums overlay code did
_not_ issue the OVERLAY_OFF cmd.

And indeed, disabling the OFF cmd results in the overlay working
flawlessly, so I guess we can workaround the lack of the above
workaround by simply never disabling the overlay engine once it's
enabled.

Note that we have the first part of the above w/a already implemented
in i830_init_clock_gating - leave that as-is to avoid surprises.

v2: Add a comment in the code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47827
Tested-by: Rhys <rhyspuk@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2:
 - Adjust context
 - s/intel_ring_emit(ring, /OUT_RING(/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:01 -08:00
Martin Schwidefsky
37a42f991f s390/signal: set correct address space control
commit fa968ee215 upstream.

If user space is running in primary mode it can switch to secondary
or access register mode, this is used e.g. in the clock_gettime code
of the vdso. If a signal is delivered to the user space process while
it has been running in access register mode the signal handler is
executed in access register mode as well which will result in a crash
most of the time.

Set the address space control bits in the PSW to the default for the
execution of the signal handler and make sure that the previous
address space control is restored on signal return. Take care
that user space can not switch to the kernel address space by
modifying the registers in the signal frame.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:01 -08:00
Mirko Lindner
39220c9e67 sky2: Fix for interrupt handler
commit d663d181b9 upstream.

Re-enable interrupts if it is not our interrupt

Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:38:01 -08:00
Tim Sally
122fd46fcc eCryptfs: check for eCryptfs cipher support at mount
commit 5f5b331d5c upstream.

The issue occurs when eCryptfs is mounted with a cipher supported by
the crypto subsystem but not by eCryptfs. The mount succeeds and an
error does not occur until a write. This change checks for eCryptfs
cipher support at mount time.

Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009.
https://bugs.launchpad.net/ecryptfs/+bug/338914

Signed-off-by: Tim Sally <tsally@atomicpeace.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:49 -08:00
Tyler Hicks
c8a1ae7c00 eCryptfs: Copy up POSIX ACL and read-only flags from lower mount
commit 069ddcda37 upstream.

When the eCryptfs mount options do not include '-o acl', but the lower
filesystem's mount options do include 'acl', the MS_POSIXACL flag is not
flipped on in the eCryptfs super block flags. This flag is what the VFS
checks in do_last() when deciding if the current umask should be applied
to a newly created inode's mode or not. When a default POSIX ACL mask is
set on a directory, the current umask is incorrectly applied to new
inodes created in the directory. This patch ignores the MS_POSIXACL flag
passed into ecryptfs_mount() and sets the flag on the eCryptfs super
block depending on the flag's presence on the lower super block.

Additionally, it is incorrect to allow a writeable eCryptfs mount on top
of a read-only lower mount. This missing check did not allow writes to
the read-only lower mount because permissions checks are still performed
on the lower filesystem's objects but it is best to simply not allow a
rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw
mount is valid and still allowed.

https://launchpad.net/bugs/1009207

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Stefan Beller <stefanbeller@googlemail.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:49 -08:00
Jan Safrata
9dbd3418ff usb: use usb_serial_put in usb_serial_probe errors
commit 0658a3366d upstream.

The use of kfree(serial) in error cases of usb_serial_probe
was invalid - usb_serial structure allocated in create_serial()
gets reference of usb_device that needs to be put, so we need
to use usb_serial_put() instead of simple kfree().

Signed-off-by: Jan Safrata <jan.nikitenko@gmail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Cc: Richard Retanubun <richardretanubun@ruggedcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:49 -08:00
Ulrich Weber
a39bdce2f2 netfilter: nf_nat: don't check for port change on ICMP tuples
commit 38fe36a248 upstream.

ICMP tuples have id in src and type/code in dst.
So comparing src.u.all with dst.u.all will always fail here
and ip_xfrm_me_harder() is called for every ICMP packet,
even if there was no NAT.

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:48 -08:00
Jozsef Kadlecsik
b3e991ea92 netfilter: Mark SYN/ACK packets as invalid from original direction
commit 64f509ce71 upstream.

Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:48 -08:00
Jozsef Kadlecsik
c581c7c77d netfilter: Validate the sequence number of dataless ACK packets as well
commit 4a70bbfaef upstream.

We spare nothing by not validating the sequence number of dataless
ACK packets and enabling it makes harder off-path attacks.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:48 -08:00
Nathan Walp
1b10e0be50 r8169: allow multicast packets on sub-8168f chipset.
commit 0481776b7a upstream.

RTL_GIGA_MAC_VER_35 includes no multicast hardware filter.

Signed-off-by: Nathan Walp <faceprint@faceprint.com>
Suggested-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:48 -08:00
Cyril Brulebois
cfc2c99614 r8169: Fix WoL on RTL8168d/8111d.
commit b00e69dee4 upstream.

This regression was spotted between Debian squeeze and Debian wheezy
kernels (respectively based on 2.6.32 and 3.2). More info about
Wake-on-LAN issues with Realtek's 816x chipsets can be found in the
following thread: http://marc.info/?t=132079219400004

Probable regression from d4ed95d796e5126bba51466dc07e287cebc8bd19;
more chipsets are likely affected.

Tested on top of a 3.2.23 kernel.

Reported-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Tested-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Hinted-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Cyril Brulebois <kibi@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:47 -08:00
Mojiong Qiu
f5a93eaffa xen/events: fix RCU warning, or Call idle notifier after irq_enter()
commit 772aebcefe upstream.

exit_idle() should be called after irq_enter(), otherwise it throws:

[ INFO: suspicious RCU usage. ]
3.6.5 #1 Not tainted
-------------------------------
include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
 #0:  (rcu_read_lock){......}, at: [<ffffffff810e9fe0>] __atomic_notifier_call_chain+0x0/0x140

stack backtrace:
Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1
Call Trace:
 <IRQ>  [<ffffffff811259a2>] lockdep_rcu_suspicious+0xe2/0x130
 [<ffffffff810ea10c>] __atomic_notifier_call_chain+0x12c/0x140
 [<ffffffff810e9fe0>] ? atomic_notifier_chain_unregister+0x90/0x90
 [<ffffffff811216cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810ea136>] atomic_notifier_call_chain+0x16/0x20
 [<ffffffff810777c3>] exit_idle+0x43/0x50
 [<ffffffff81568865>] xen_evtchn_do_upcall+0x25/0x50
 [<ffffffff81aa690e>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
 [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
 [<ffffffff81061540>] ? xen_safe_halt+0x10/0x20
 [<ffffffff81075cfa>] ? default_idle+0xba/0x570
 [<ffffffff810778af>] ? cpu_idle+0xdf/0x140
 [<ffffffff81a4d881>] ? rest_init+0x135/0x144
 [<ffffffff81a4d74c>] ? csum_partial_copy_generic+0x16c/0x16c
 [<ffffffff82520c45>] ? start_kernel+0x3db/0x3e8
 [<ffffffff8252066a>] ? repair_env_string+0x5a/0x5a
 [<ffffffff82520356>] ? x86_64_start_reservations+0x131/0x135
 [<ffffffff82524aca>] ? xen_start_kernel+0x465/0x46

Git commit 98ad1cc14a
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Fri Oct 7 18:22:09 2011 +0200

    x86: Call idle notifier after irq_enter()

did this, but it missed the Xen code.

Signed-off-by: Mojiong Qiu <mjqiu@tencent.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:47 -08:00
Michal Schmidt
56cb8f7e43 r8169: use unlimited DMA burst for TX
commit aee77e4acc upstream.

The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
a box where this prevents the interface from using the gigabit line to its full
potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.

The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
(XID 0c900880). TSO is enabled.

I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
'7'=unlimited). This chart shows the results:
http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png

Interesting points:
 - With the current DMA burst limit (1024):
   - at the default MTU=1500 I get only 842 Mbit/s.
   - when going from small MTU, the performance rises monotonically with
     increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
     a sudden drop to 762 MBit/s from which the throughput rises monotonically
     again with further MTU increases.
 - With a smaller DMA burst limit (512):
   - there's a similar peak at MTU=1076 and another one at MTU=564.
 - With unlimited DMA burst:
   - at the default MTU=1500 I get nice 940 Mbit/s.
   - the throughput rises monotonically with increasing MTU with no strange
     peaks.

Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
limit plus 52. Why 52? Because:
  20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52

The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
the smaller burst limit, or if any other versions have similar requirements.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:47 -08:00
Hugh Dickins
ec5924204b tmpfs: change final i_blocks BUG to WARNING
commit 0f3c42f522 upstream.

Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.

It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count.  There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.

One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected.  Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.

So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:47 -08:00
Tom Herbert
f86c309e32 net-rps: Fix brokeness causing OOO packets
[ Upstream commit baefa31db2 ]

In commit c445477d74 which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:47 -08:00
Jiri Pirko
4e95708469 net: correct check in dev_addr_del()
[ Upstream commit a652208e0b ]

Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:46 -08:00
Hannes Frederic Sowa
798f49e632 ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value
[ Upstream commit d4596bad2a ]

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:46 -08:00
Xi Wang
0fa89335e4 ipv4: avoid undefined behavior in do_ip_setsockopt()
[ Upstream commit 0c9f79be29 ]

(1<<optname) is undefined behavior in C with a negative optname or
optname larger than 31.  In those cases the result of the shift is
not necessarily zero (e.g., on x86).

This patch simplifies the code with a switch statement on optname.
It also allows the compiler to generate better code (e.g., using a
64-bit mask).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:46 -08:00
Andreas Schwab
999d6a5194 m68k: fix sigset_t accessor functions
commit 34fa78b59c upstream.

The sigaddset/sigdelset/sigismember functions that are implemented with
bitfield insn cannot allow the sigset argument to be placed in a data
register since the sigset is wider than 32 bits.  Remove the "d"
constraint from the asm statements.

The effect of the bug is that sending RT signals does not work, the signal
number is truncated modulo 32.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:46 -08:00
Johannes Berg
dd361b4987 wireless: allow 40 MHz on world roaming channels 12/13
commit 43c771a196 upstream.

When in world roaming mode, allow 40 MHz to be used
on channels 12 and 13 so that an AP that is, e.g.,
using HT40+ on channel 9 (in the UK) can be used.

Reported-by: Eddie Chapman <eddie@ehuk.net>
Tested-by: Eddie Chapman <eddie@ehuk.net>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:46 -08:00
Michal Hocko
e5c4ee6a08 memcg: oom: fix totalpages calculation for memory.swappiness==0
commit 9a5a8f19b4 upstream.

oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation.  The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp.  memsw portion of it).

This is usually correct but since fe35004fbf ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages.  This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.

The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:45 -08:00
Zhao Yakui
9098e87856 ttm: Clear the ttm page allocated from high memory zone correctly
commit ac207ed247 upstream.

The TTM page can be allocated from high memory. In such case it is
wrong to use the page_address(page) as the virtual address for the high memory
page.

bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50241

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:37:45 -08:00