Commit Graph

8486 Commits

Author SHA1 Message Date
Ben Hutchings
8dfabfcf7b modules: Skip empty sections when exporting section notes
commit 10b465aaf9 upstream.

Commit 35dead4 "modules: don't export section names of empty sections
via sysfs" changed the set of sections that have attributes, but did
not change the iteration over these attributes in add_notes_attrs().
This can lead to add_notes_attrs() creating attributes with the wrong
names or with null name pointers.

Introduce a sect_empty() function and use it in both add_sect_attrs()
and add_notes_attrs().

Reported-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:14 -08:00
Peter Zijlstra
fc310225df sched: Sched_rt_periodic_timer vs cpu hotplug
commit 047106adcc upstream.

Heiko reported a case where a timer interrupt managed to
reference a root_domain structure that was already freed by a
concurrent hot-un-plug operation.

Solve this like the regular sched_domain stuff is also
synchronized, by adding a synchronize_sched() stmt to the free
path, this ensures that a root_domain stays present for any
atomic section that could have observed it.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Siddha Suresh B <suresh.b.siddha@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <1258363873.26714.83.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:05:17 -08:00
Peter Zijlstra
a09adfeb9e sched: Fix balance vs hotplug race
commit 6ad4c18884 upstream.

Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
sched domain managment) we have cpu_active_mask which is suppose to rule
scheduler migration and load-balancing, except it never (fully) did.

The particular problem being solved here is a crash in try_to_wake_up()
where select_task_rq() ends up selecting an offline cpu because
select_task_rq_fair() trusts the sched_domain tree to reflect the
current state of affairs, similarly select_task_rq_rt() trusts the
root_domain.

However, the sched_domains are updated from CPU_DEAD, which is after the
cpu is taken offline and after stop_machine is done. Therefore it can
race perfectly well with code assuming the domains are right.

Cure this by building the domains from cpu_active_mask on
CPU_DOWN_PREPARE.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:49 -08:00
WANG Cong
4b6d263595 kernel/sysctl.c: fix the incomplete part of sysctl_max_map_count-should-be-non-negative.patch
commit 3e26120cc7 upstream.

It is a mistake that we used 'proc_dointvec', it should be
'proc_dointvec_minmax', as in the original patch.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:32 -08:00
Amerigo Wang
3ec268a682 'sysctl_max_map_count' should be non-negative
commit 70da2340fb upstream.

Jan Engelhardt reported we have this problem:

setting max_map_count to a value large enough results in programs dying at
first try.  This is on 2.6.31.6:

15:59 borg:/proc/sys/vm # echo $[1<<31-1] >max_map_count
15:59 borg:/proc/sys/vm # cat max_map_count
1073741824
15:59 borg:/proc/sys/vm # echo $[1<<31] >max_map_count
15:59 borg:/proc/sys/vm # cat max_map_count
Killed

This is because we have a chance to make 'max_map_count' negative.  but
it's meaningless.  Make it only accept non-negative values.

Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: James Morris <jmorris@namei.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:31 -08:00
David Howells
0399123f3d NOMMU: Optimise away the {dac_,}mmap_min_addr tests
commit 6e14154676 upstream.

In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
skipped by the compiler.  We do this as the minimum mmap address doesn't make
any sense in NOMMU mode.

mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:30 -08:00
Thomas Gleixner
fa3f5a5c1c clockevents: Prevent clockevent_devices list corruption on cpu hotplug
commit bb6eddf767 upstream.

Xiaotian Feng triggered a list corruption in the clock events list on
CPU hotplug and debugged the root cause.

If a CPU registers more than one per cpu clock event device, then only
the active clock event device is removed on CPU_DEAD. The unused
devices are kept in the clock events device list.

On CPU up the clock event devices are registered again, which means
that we list_add an already enqueued list_head. That results in list
corruption.

Resolve this by removing all devices which are associated to the dead
CPU on CPU_DEAD.

Reported-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:21 -08:00
Peter Zijlstra
8e04c81a22 sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE
commit e4f4288842 upstream.

We should skip !SD_LOAD_BALANCE domains.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.653578430@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:20 -08:00
Peter Zijlstra
14ae082058 sched: Fix task_hot() test order
commit e6c8fba777 upstream.

Make sure not to access sched_fair fields before verifying it is
indeed a sched_fair task.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.577998058@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:18 -08:00
Paul Mackerras
517361c51d perf_event: Fix incorrect range check on cpu number
commit 0f624e7e56 upstream.

It is quite legitimate for CPUs to be numbered sparsely, meaning
that it possible for an online CPU to have a number which is
greater than the total count of possible CPUs.

Currently find_get_context() has a sanity check on the cpu
number where it checks it against num_possible_cpus().  This
test can fail for a legitimate cpu number if the
cpu_possible_mask is sparsely populated.

This fixes the problem by checking the CPU number against
nr_cpumask_bits instead, since that is the appropriate check to
ensure that the cpu number is same to pass to cpu_isset()
subsequently.

Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Tested-by: Michael Neuling <mikey@neuling.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091215084032.GA18661@brick.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:06:01 -08:00
Andi Kleen
bb2fb5d6a1 futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
commit 722d017237 upstream.

get_user_pages() must be called with mmap_sem held.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linuxfoundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091208121942.GA21298@basil.fritz.box>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:15 -08:00
Alexey Dobriyan
220d5b50ca bsdacct: fix uid/gid misreporting
commit 4b731d50ff upstream.

commit d8e180dcd5 "bsdacct: switch
credentials for writing to the accounting file" introduced credential
switching during final acct data collecting.  However, uid/gid pair
continued to be collected from current which became credentials of who
created acct file, not who exits.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14676

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Juho K. Juopperi <jkj@kapsi.fi>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:52 -08:00
Mike Galbraith
35c1ee3e78 sched: Fix and clean up rate-limit newidle code
commit eae0c9dfb5 upstream.

Commit 1b9508f, "Rate-limit newidle" has been confirmed to fix
the netperf UDP loopback regression reported by Alex Shi.

This is a cleanup and a fix:

 - moved to a more out of the way spot

 - fix to ensure that balancing doesn't try to balance
   runqueues which haven't gone online yet, which can
   mess up CPU enumeration during boot.

Reported-by: Alex Shi <alex.shi@intel.com>
Reported-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1257821402.5648.17.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:14 -08:00
Mike Galbraith
37ed2d7007 sched: Rate-limit newidle
commit 1b9508f683 upstream.

Rate limit newidle to migration_cost. It's a win for all
stages of sysbench oltp tests.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:13 -08:00
Mike Galbraith
ece125459c sched: Fix affinity logic in select_task_rq_fair()
commit fd210738f6 upstream.

Ingo Molnar reported:

[   26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
[   26.808000] caller is vmstat_update+0x26/0x70
[   26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
[   26.816000] Call Trace:
[   26.820000]  [<c1924a24>] ? printk+0x28/0x3c
[   26.824000]  [<c13258a0>] debug_smp_processor_id+0xf0/0x110
[   26.824000] mount used greatest stack depth: 1464 bytes left
[   26.828000]  [<c111d086>] vmstat_update+0x26/0x70
[   26.832000]  [<c1086418>] worker_thread+0x188/0x310
[   26.836000]  [<c10863b7>] ? worker_thread+0x127/0x310
[   26.840000]  [<c108d310>] ? autoremove_wake_function+0x0/0x60
[   26.844000]  [<c1086290>] ? worker_thread+0x0/0x310
[   26.848000]  [<c108cf0c>] kthread+0x7c/0x90
[   26.852000]  [<c108ce90>] ? kthread+0x0/0x90
[   26.856000]  [<c100c0a7>] kernel_thread_helper+0x7/0x10
[   26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
[   26.864000] caller is vmstat_update+0x3c/0x70

Because this commit:

  a1f84a3: sched: Check for an idle shared cache in select_task_rq_fair()

broke ->cpus_allowed.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: arjan@infradead.org
LKML-Reference: <1257415066.12867.1.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:13 -08:00
Mike Galbraith
c2f1953231 sched: Check for an idle shared cache in select_task_rq_fair()
commit a1f84a3ab8 upstream.

When waking affine, check for an idle shared cache, and if
found, wake to that CPU/sibling instead of the waker's CPU.

This improves pgsql+oltp ramp up by roughly 8%. Possibly more
for other loads, depending on overlap. The trade-off is a
roughly 1% peak downturn if tasks are truly synchronous.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1256654138.17752.7.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:12 -08:00
Kristian Høgsberg
9f3bb2b1e8 perf: Don't free perf_mmap_data until work has been done
commit ec70ccd806 upstream.

In the CONFIG_PERF_USE_VMALLOC case, perf_mmap_data_free() only
schedules the cleanup of the perf_mmap_data struct.  In that
case we have to wait until the work has been done before we free
data.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259697901-1747-1-git-send-email-krh@bitplanet.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:08 -08:00
Xiao Guangrong
b7b4aef82a perf_event: Initialize data.period in perf_swevent_hrtimer()
commit 59d069eb5a upstream.

In current code in perf_swevent_hrtimer(), data.period is not
initialized, The result is obvious wrong:

 # ./perf record -f -e cpu-clock make
 # ./perf report
 # Samples: 1740
 #
 # Overhead   Command                                   ......
 # ........  ........  ..........................................
 #
   1025422183050275328.00%        sh  libc-2.9.90.so ...
   1025422183050275328.00%      perl  libperl.so     ...
   1025422168240043264.00%      perl  [kernel]       ...
   1025422030011210752.00%      perl  [kernel]       ...

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14E220.2050107@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:08 -08:00
Paul E. McKenney
b87444a845 rcu: Remove inline from forward-referenced functions
commit dbe01350fa upstream.

Some variants of gcc are reputed to dislike forward references
to functions declared "inline".  Remove the "inline" keyword
from such functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Benjamin Gilbert <bgilbert@cs.cmu.edu>
LKML-Reference: <12578890422402-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:04 -08:00
Paul E. McKenney
c541f7c75c rcu: Fix note_new_gpnum() uses of ->gpnum
commit 9160306e6f upstream.

Impose a clear locking design on the note_new_gpnum()
function's use of the ->gpnum counter.  This is done by updating
rdp->gpnum only from the corresponding leaf rcu_node structure's
rnp->gpnum field, and even then only under the protection of
that same rcu_node structure's ->lock field.  Performance and
scalability are maintained using a form of double-checked
locking, and excessive spinning is avoided by use of the
spin_trylock() function.  The use of spin_trylock() is safe due
to the fact that CPUs who fail to acquire this lock will try
again later. The hierarchical nature of the rcu_node data
structure limits contention (which could be limited further if
need be using the RCU_FANOUT kernel parameter).

Without this patch, obscure but quite possible races could
result in a quiescent state that occurred during one grace
period to be accounted to the following grace period, causing
this following grace period to end prematurely.  Not good!

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12571987492350-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:01 -08:00
Paul E. McKenney
1a0d7df335 rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
commit d09b62dfa3 upstream.

Impose a clear locking design on the rcu_process_gp_end()
function's use of the ->completed counter.  This is done by
creating a ->completed field in the rcu_node structure, which
can safely be accessed under the protection of that structure's
lock.  Performance and scalability are maintained by using a
form of double-checked locking, so that rcu_process_gp_end()
only acquires the leaf rcu_node structure's ->lock if a grace
period has recently ended.

This fix reduces rcutorture failure rate by at least two orders
of magnitude under heavy stress with force_quiescent_state()
being invoked artificially often.  Without this fix,
unsynchronized access to the ->completed field can cause
rcu_process_gp_end() to advance callbacks whose grace period has
not yet expired.  (Bad idea!)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12571987494069-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:00 -08:00
Paul E. McKenney
1aab3863fe rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
commit 281d150c5f upstream.

Impose a clear locking design on non-NO_HZ handling of the
->completed counter.  This increases the distance between the
RCU and the CPU-hotplug mechanisms.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12571987491353-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:02:59 -08:00
Helge Deller
35dead4235 modules: don't export section names of empty sections via sysfs
On the parisc architecture we face for each and every loaded kernel module
this kernel "badness warning":
  sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text'
  Badness at fs/sysfs/dir.c:487

Reason for that is, that on parisc all kernel modules do have multiple
.text sections due to the usage of the -ffunction-sections compiler flag
which is needed to reach all jump targets on this platform.

An objdump on such a kernel module gives:
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .note.gnu.build-id 00000024  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         00000000  00000000  00000000  00000058  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .text.ac97_bus_match 0000001c  00000000  00000000  00000058  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .text         00000000  00000000  00000000  000000d4  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
...
Since the .text sections are empty (size of 0 bytes) and won't be
loaded by the kernel module loader anyway, I don't see a reason
why such sections need to be listed under
/sys/module/<module_name>/sections/<section_name> either.

The attached patch does solve this issue by not exporting section
names which are empty.

This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703

Signed-off-by: Helge Deller <deller@gmx.de>
CC: rusty@rustcorp.com.au
CC: akpm@linux-foundation.org
CC: James.Bottomley@HansenPartnership.com
CC: roland@redhat.com
CC: dave@hiauly1.hia.nrc.ca
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-02 15:38:25 -08:00
David Howells
f13a48bd79 SLOW_WORK: Move slow_work's proc file to debugfs
Move slow_work's debugging proc file to debugfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Requested-and-acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 08:20:31 -08:00
David Howells
fa1dae4906 SLOW_WORK: Fix the CONFIG_MODULES=n case
Commits 3d7a641 ("SLOW_WORK: Wait for outstanding work items belonging to a
module to clear") introduced some code to make sure that all of a module's
slow-work items were complete before that module was removed, and commit
3bde31a ("SLOW_WORK: Allow a requeueable work item to sleep till the thread is
needed") further extended that, breaking it in the process if CONFIG_MODULES=n:

    CC      kernel/slow-work.o
  kernel/slow-work.c: In function 'slow_work_execute':
  kernel/slow-work.c:313: error: 'slow_work_thread_processing' undeclared (first use in this function)
  kernel/slow-work.c:313: error: (Each undeclared identifier is reported only once
  kernel/slow-work.c:313: error: for each function it appears in.)
  kernel/slow-work.c: In function 'slow_work_wait_for_items':
  kernel/slow-work.c:950: error: 'slow_work_unreg_sync_lock' undeclared (first use in this function)
  kernel/slow-work.c:951: error: 'slow_work_unreg_wq' undeclared (first use in this function)
  kernel/slow-work.c:961: error: 'slow_work_unreg_work_item' undeclared (first use in this function)
  kernel/slow-work.c:974: error: 'slow_work_unreg_module' undeclared (first use in this function)
  kernel/slow-work.c:977: error: 'slow_work_thread_processing' undeclared (first use in this function)
  make[1]: *** [kernel/slow-work.o] Error 1

Fix this by:

 (1) Extracting the bits of slow_work_execute() that are contingent on
     CONFIG_MODULES, and the bits that should be, into inline functions and
     placing them into the #ifdef'd section that defines the relevant variables
     and adding stubs for moduleless kernels.  This allows the removal of some
     #ifdefs.

 (2) #ifdef'ing out the contents of slow_work_wait_for_items() in moduleless
     kernels.

The four functions related to handling module unloading synchronisation (and
their associated variables) could be offloaded into a separate .c file, but
each function is only used once and three of them are tiny, so doing so would
prevent them from being inlined.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 07:35:11 -08:00
David Howells
3bde31a4ac SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed
Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.

Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.

In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.

However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.

So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().

This function will then sleep until either the item's event occurs or another
work item appears on the queue.  If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.

This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind.  As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:57 +00:00
David Howells
8fba10a42d SLOW_WORK: Allow the work items to be viewed through a /proc file
Allow the executing and queued work items to be viewed through a /proc file
for debugging purposes.  The contents look something like the following:

    THR PID   ITEM ADDR        FL MARK  DESC
    === ===== ================ == ===== ==========
      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK

In the 'THR' column, executing items show the thread they're occupying and
queued threads indicate which queue they're on.  'PID' shows the process ID of
a slow-work thread that's executing something.  'FL' shows the work item flags.
'MARK' indicates how long since an item was queued or began executing.  Lastly,
the 'DESC' column permits the owner of an item to give some information.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:51 +00:00
Jens Axboe
6b8268b17a SLOW_WORK: Add delayed_slow_work support
This adds support for starting slow work with a delay, similar
to the functionality we have for workqueues.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:47 +00:00
Jens Axboe
0160950297 SLOW_WORK: Add support for cancellation of slow work
Add support for cancellation of queued slow work and delayed slow work items.
The cancellation functions will wait for items that are pending or undergoing
execution to be discarded by the slow work facility.

Attempting to enqueue work that is in the process of being cancelled will
result in ECANCELED.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:43 +00:00
Jens Axboe
4d8bb2cbcc SLOW_WORK: Make slow_work_ops ->get_ref/->put_ref optional
Make the ability for the slow-work facility to take references on a work item
optional as not everyone requires this.

Even the internal slow-work stubs them out, so those can be got rid of too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:39 +00:00
David Howells
3d7a641e54 SLOW_WORK: Wait for outstanding work items belonging to a module to clear
Wait for outstanding slow work items belonging to a module to clear when
unregistering that module as a user of the facility.  This prevents the put_ref
code of a work item from being taken away before it returns.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:23 +00:00
Tejun Heo
9398180097 workqueue: fix race condition in schedule_on_each_cpu()
Commit 65a6446434 ("HWPOISON: Allow
schedule_on_each_cpu() from keventd") which allows schedule_on_each_cpu()
to be called from keventd added a race condition.  schedule_on_each_cpu()
may race with cpu hotplug and end up executing the function twice on a
cpu.

Fix it by moving direct execution into the section protected with
get/put_online_cpus().  While at it, update code such that direct
execution is done after works have been scheduled for all other cpus and
drop unnecessary cpu != orig test from flush loop.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17 17:40:33 -08:00
Linus Torvalds
961767b75d Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE
  highmem: Fix race in debug_kmap_atomic() which could cause warn_count to underflow
  rcu: Fix long-grace-period race between forcing and initialization
  uids: Prevent tear down race
2009-11-11 11:30:15 -08:00
Linus Torvalds
1fd18a871a Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: try_one_irq() must be called with irq disabled
2009-11-11 11:29:58 -08:00
Cyrill Gorcunov
e9036b36ee sched: Use root_task_group_empty only with FAIR_GROUP_SCHED
root_task_group_empty is used only with FAIR_GROUP_SCHED
so if we use other scheduler options we get:

  kernel/sched.c:314: warning: 'root_task_group_empty' defined but not used

So move CONFIG_FAIR_GROUP_SCHED up that it covers
root_task_group_empty().

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091026192414.GB5321@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:15:48 +01:00
Randy Dunlap
968c86458a sched: Fix kernel-doc function parameter name
Fix variable name in sched.c kernel-doc notation.

Fixes this DocBook warning:

 Warning(kernel/sched.c:2008): No description found for parameter
 'p' Warning(kernel/sched.c:2008): Excess function parameter 'k'
 description in 'kthread_bind'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4AF4B1BC.8020604@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:26:25 +01:00
Yong Zhang
e7e7e0c084 genirq: try_one_irq() must be called with irq disabled
Prarit reported:
=================================
[ INFO: inconsistent lock state ]
2.6.32-rc5 #1
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (&irq_desc_lock_class){?.-...}, at: [<ffffffff810c264e>] try_one_irq+0x32/0x138
{IN-HARDIRQ-W} state was registered at:
 [<ffffffff81095160>] __lock_acquire+0x2fc/0xd5d
 [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
 [<ffffffff814cdadd>] _spin_lock+0x40/0x89
 [<ffffffff810c3389>] handle_level_irq+0x30/0x105
 [<ffffffff81014e0e>] handle_irq+0x95/0xb7
 [<ffffffff810141bd>] do_IRQ+0x6a/0xe0
 [<ffffffff81012813>] ret_from_intr+0x0/0x16
irq event stamp: 195096
hardirqs last  enabled at (195096): [<ffffffff814cd7f7>] _spin_unlock_irq+0x3a/0x5c
hardirqs last disabled at (195095): [<ffffffff814cdbdd>] _spin_lock_irq+0x29/0x95
softirqs last  enabled at (195088): [<ffffffff81068c92>] __do_softirq+0x1c1/0x1ef
softirqs last disabled at (195093): [<ffffffff8101304c>] call_softirq+0x1c/0x30

other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (kernel/irq/spurious.c:21){+.-...}, at: [<ffffffff81070cf2>]
run_timer_softirq+0x1a9/0x315

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.32-rc5 #1
Call Trace:
 <IRQ>  [<ffffffff81093e94>] valid_state+0x187/0x1ae
 [<ffffffff81093fe4>] mark_lock+0x129/0x253
 [<ffffffff810951d4>] __lock_acquire+0x370/0xd5d
 [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
 [<ffffffff814cdadd>] _spin_lock+0x40/0x89
 [<ffffffff810c264e>] try_one_irq+0x32/0x138
 [<ffffffff810c2795>] poll_all_shared_irqs+0x41/0x6d
 [<ffffffff810c27dd>] poll_spurious_irqs+0x1c/0x49
 [<ffffffff81070d82>] run_timer_softirq+0x239/0x315
 [<ffffffff81068bd3>] __do_softirq+0x102/0x1ef
 [<ffffffff8101304c>] call_softirq+0x1c/0x30
 [<ffffffff81014b65>] do_softirq+0x59/0xca
 [<ffffffff810686ad>] irq_exit+0x58/0xae
 [<ffffffff81029b84>] smp_apic_timer_interrupt+0x94/0xba
 [<ffffffff81012a33>] apic_timer_interrupt+0x13/0x20

The reason is that try_one_irq() is called from hardirq context with
interrupts disabled and from softirq context (poll_all_shared_irqs())
with interrupts enabled.

Disable interrupts before calling it from poll_all_shared_irqs().

Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <1257563773-4620-1-git-send-email-yong.zhang0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-07 21:44:45 +01:00
Linus Torvalds
608221fdf9 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
  sched: Disable SD_PREFER_LOCAL at node level
  sched: Fix boot crash by zalloc()ing most of the cpu masks
  sched: Strengthen buddies and mitigate buddy induced latencies
2009-11-05 10:56:47 -08:00
Linus Torvalds
72cc129e8d Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: Fix unmatched locking in ftrace_regex_write()
  ring-buffer: Synchronize resizing buffer with reader lock
2009-11-05 10:56:25 -08:00
Li Zefan
ed146b2594 ftrace: Fix unmatched locking in ftrace_regex_write()
When a command is passed to the set_ftrace_filter, then
the ftrace_regex_lock is still held going back to user space.

 # echo 'do_open : foo' > set_ftrace_filter
 (still holding ftrace_regex_lock when returning to user space!)

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AEF7F8A.3080300@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-04 01:42:10 -05:00
Lai Jiangshan
f7112949f6 ring-buffer: Synchronize resizing buffer with reader lock
We got a sudden panic when we reduced the size of the
ringbuffer.

We can reproduce the panic by the following steps:

echo 1 > events/sched/enable
cat trace_pipe > /dev/null &

while ((1))
do
echo 12000 > buffer_size_kb
echo 512 > buffer_size_kb
done

(not more than 5 seconds, panic ...)

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4AF01735.9060409@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-04 00:04:20 -05:00
Linus Torvalds
38dc63459f Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Remove some debug messages producing too much noise
  PM: Fix warning on suspend errors
  PM / Hibernate: Add newline to load_image() fail path
  PM / Hibernate: Fix error handling in save_image()
  PM / Hibernate: Fix blkdev refleaks
  PM / yenta: Split resume into early and late parts (rev. 4)
2009-11-03 07:52:57 -08:00
Ian Campbell
1d51075094 Correct nr_processes() when CPUs have been unplugged
nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.

Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.

The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
        stat->nlink = proc_root.nlink + nr_processes();

You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.

Example with some debugging inserted to show what goes on:
        # ps haux|wc -l
        nr_processes: CPU0:     90
        nr_processes: CPU1:     1030
        nr_processes: CPU2:     -900
        nr_processes: CPU3:     -136
        nr_processes: TOTAL:    84
        proc_root_getattr. nlink 12 + nr_processes() 84 = 96
        84
        # echo 0 >/sys/devices/system/cpu/cpu1/online
        # ps haux|wc -l
        nr_processes: CPU0:     85
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -953
        proc_root_getattr. nlink 12 + nr_processes() -953 = -941
        75
        # stat /proc/
        nr_processes: CPU0:     84
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -954
        proc_root_getattr. nlink 12 + nr_processes() -954 = -942
          File: `/proc/'
          Size: 0               Blocks: 0          IO Block: 1024   directory
        Device: 3h/3d   Inode: 1           Links: 4294966354
        Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
        Access: 2009-11-03 09:06:55.000000000 +0000
        Modify: 2009-11-03 09:06:55.000000000 +0000
        Change: 2009-11-03 09:06:55.000000000 +0000

I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().

This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-03 07:52:39 -08:00
Jiri Slaby
bf9fd67a03 PM / Hibernate: Add newline to load_image() fail path
Finish a line by \n when load_image fails in the middle of loading.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-11-03 11:03:09 +01:00
Jiri Slaby
4ff277f9e4 PM / Hibernate: Fix error handling in save_image()
There are too many retval variables in save_image(). Thus error return
value from snapshot_read_next() may be ignored and only part of the
snapshot (successfully) written.

Remove 'error' variable, invert the condition in the do-while loop
and convert the loop to use only 'ret' variable.

Switch the rest of the function to consider only 'ret'.

Also make sure we end printed line by \n if an error occurs.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-11-03 11:02:43 +01:00
Jiri Slaby
76b57e613f PM / Hibernate: Fix blkdev refleaks
While cruising through the swsusp code I found few blkdev reference
leaks of resume_bdev.

swsusp_read: remove blkdev_put altogether. Some fail paths do
             not do that.
swsusp_check: make sure we always put a reference on fail paths
software_resume: all fail paths between swsusp_check and swsusp_read
                 omit swsusp_close. Add it in those cases. And since
                 swsusp_read doesn't drop the reference anymore, do
                 it here unconditionally.

[rjw: Fixed a small coding style issue.]

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-11-03 11:01:46 +01:00
Mike Galbraith
b84ff7d6f1 sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
Eric Paris reported that commit
f685ceacab causes boot time
PREEMPT_DEBUG complaints.

 [    4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314
 [    4.593043] caller is task_hot+0x86/0xd0

Since kthread_bind() messes with scheduler internals, move the
body to sched.c, and lock the runqueue.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Tested-by: Eric Paris <eparis@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1256813310.7574.3.camel@marge.simson.net>
[ v2: fix !SMP build and clean up ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-03 07:25:00 +01:00
Linus Torvalds
3fe866ca6c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Fix spurious wakeup for requeue_pi really
2009-11-02 09:46:33 -08:00
Linus Torvalds
bce8fc4cb7 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Remove -Wcast-align
  perf tools: Fix compatibility with libelf 0.8 and autodetect
  perf events: Don't generate events for the idle task when exclude_idle is set
  perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
2009-11-02 09:46:06 -08:00
Linus Torvalds
a5e3013d66 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove cpu arg from the rb_time_stamp() function
  tracing: Fix comment typo and documentation example
  tracing: Fix trace_seq_printf() return value
  tracing: Update *ppos instead of filp->f_pos
2009-11-02 09:45:44 -08:00
Paul E. McKenney
83f5b01ffb rcu: Fix long-grace-period race between forcing and initialization
Very long RCU read-side critical sections (50 milliseconds or
so) can cause a race between force_quiescent_state() and
rcu_start_gp() as follows on kernel builds with multi-level
rcu_node hierarchies:

1.	CPU 0 calls force_quiescent_state(), sees that there is a
	grace period in progress, and acquires ->fsqlock.

2.	CPU 1 detects the end of the grace period, and so
	cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
	This operation is carried out under the root rnp->lock,
	but CPU 0 has not yet acquired that lock.  Note that
	rsp->signaled is still RCU_SAVE_DYNTICK from the last
	grace period.

3.	CPU 1 calls rcu_start_gp(), but no one wants a new grace
	period, so it drops the root rnp->lock and returns.

4.	CPU 0 acquires the root rnp->lock and picks up rsp->completed
	and rsp->signaled, then drops rnp->lock.  It then enters the
	RCU_SAVE_DYNTICK leg of the switch statement.

5.	CPU 2 invokes call_rcu(), and now needs a new grace period.
	It calls rcu_start_gp(), which acquires the root rnp->lock, sets
	rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
	the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
	initializing the rcu_node hierarchy.  If there are multiple
	levels to the hierarchy, it will drop the root rnp->lock and
	initialize the lower levels of the hierarchy.

6.	CPU 0 notes that rsp->completed has not changed, which permits
        both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
	update prevails, later calls to force_quiescent_state() can
	count old quiescent states against the new grace period, which
	can in turn result in premature ending of grace periods.

	Not good.

This patch adds an RCU_GP_IDLE state for rsp->signaled that is
set initially at boot time and any time a grace period ends.
This prevents CPU 0 from getting into the workings of
force_quiescent_state() in step 4.  Additional locking and
checks prevent the concurrent update of rsp->signaled in step 6.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1256742889199-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 16:06:21 +01:00
Thomas Gleixner
b00bc0b237 uids: Prevent tear down race
Ingo triggered the following warning:

WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
Hardware name: System Product Name
ODEBUG: init active object type: timer_list
Modules linked in:
Pid: 2619, comm: dmesg Tainted: G        W  2.6.32-rc5-tip+ #5298
Call Trace:
 [<81035443>] warn_slowpath_common+0x6a/0x81
 [<8120e483>] ? debug_print_object+0x42/0x50
 [<81035498>] warn_slowpath_fmt+0x29/0x2c
 [<8120e483>] debug_print_object+0x42/0x50
 [<8120ec2a>] __debug_object_init+0x279/0x2d7
 [<8120ecb3>] debug_object_init+0x13/0x18
 [<810409d2>] init_timer_key+0x17/0x6f
 [<81041526>] free_uid+0x50/0x6c
 [<8104ed2d>] put_cred_rcu+0x61/0x72
 [<81067fac>] rcu_do_batch+0x70/0x121

debugobjects warns about an enqueued timer being initialized. If
CONFIG_USER_SCHED=y the user management code uses delayed work to
remove the user from the hash table and tear down the sysfs objects.

free_uid is called from RCU and initializes/schedules delayed work if
the usage count of the user_struct is 0. The init/schedule happens
outside of the uidhash_lock protected region which allows a concurrent
caller of find_user() to reference the about to be destroyed
user_struct w/o preventing the work from being scheduled. If the next
free_uid call happens before the work timer expired then the active
timer is initialized and the work scheduled again.

The race was introduced in commit 5cb350ba (sched: group scheduling,
sysfs tunables) and made more prominent by commit 3959214f (sched:
delayed cleanup of user_struct)

Move the init/schedule_delayed_work inside of the uidhash_lock
protected region to prevent the race.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable@kernel.org
2009-11-02 16:02:39 +01:00
Rusty Russell
49557e6203 sched: Fix boot crash by zalloc()ing most of the cpu masks
I got a boot crash when forcing cpumasks offstack on 32 bit,
because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
wasn't zeroed).

AFAICT the others need to be zeroed too: only
nohz.ilb_grp_nohz_mask is initialized before use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <200911022037.21282.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:48:54 +01:00
Linus Torvalds
8633322c5f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  sched: move rq_weight data array out of .percpu
  percpu: allow pcpu_alloc() to be called with IRQs off
2009-10-29 09:19:29 -07:00
Linus Torvalds
9532faeb29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-param-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-param-fixes:
  param: fix setting arrays of bool
  param: fix NULL comparison on oom
  param: fix lots of bugs with writing charp params from sysfs, by leaking mem.
2009-10-29 09:18:20 -07:00
Linus Torvalds
3242f9804b Merge branch 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
  HWPOISON: fix invalid page count in printk output
  HWPOISON: Allow schedule_on_each_cpu() from keventd
  HWPOISON: fix/proc/meminfo alignment
  HWPOISON: fix oops on ksm pages
  HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
  HWPOISON: return early on non-LRU pages
  HWPOISON: Add brief hwpoison description to Documentation
  HWPOISON: Clean up PR_MCE_KILL interface
2009-10-29 08:20:00 -07:00
Linus Torvalds
fefcfd431b Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Move drop_futex_key_refs out of spinlock'ed region
  rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang
  rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
  rcu: Prevent RCU IPI storms in presence of high call_rcu() load
  futex: Check for NULL keys in match_futex
  futex: Handle spurious wake up
2009-10-29 08:12:20 -07:00
Linus Torvalds
37c2ca2411 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf timechart: Improve the visual appearance of scheduler delays
  perf timechart: Fix the wakeup-arrows that point to non-visible processes
  perf top: Fix --delay_secs 0 division by zero
  perf tools: Bump version to 0.0.2
  perf_event: Adjust frequency and unthrottle for non-group-leader events
2009-10-29 08:12:00 -07:00
Linus Torvalds
6e958d73c2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Do less agressive buddy clearing
  sched: Disable SD_PREFER_LOCAL for MC/CPU domains
2009-10-29 08:10:38 -07:00
Alexey Dobriyan
8c85dd8730 sysctl: fix false positives when PROC_SYSCTL=n
Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Peter Teoh <htmldeveloper@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:30 -07:00
KOSAKI Motohiro
478988d3b2 cgroup: fix strstrip() misuse
cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip().  it makes small inconsistent behavior.

example:
=========================
 # cd /mnt/cgroup/hoge
 # cat memory.swappiness
 60
 # echo "59 " > memory.swappiness
 # cat memory.swappiness
 59
 # echo " 58" > memory.swappiness
 bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:25 -07:00
Christian Borntraeger
0d0df599f1 connector: fix regression introduced by sid connector
Since commit 02b51df1b0 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
 [<000000000048a946>] sk_filter+0x9a/0xd0
 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
 [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
 [<0000000000142604>] __set_special_pids+0x58/0x90
 [<0000000000159938>] sys_setsid+0xb4/0xd8
 [<00000000001187fe>] sysc_noemu+0x10/0x16
 [<00000041616cb266>] 0x41616cb266

The warning is
--->    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Scott James Remnant <scott@ubuntu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:25 -07:00
Rusty Russell
3c7d76e371 param: fix setting arrays of bool
We create a dummy struct kernel_param on the stack for parsing each
array element, but we didn't initialize the flags word.  This matters
for arrays of type "bool", where the flag indicates if it really is
an array of bools or unsigned int (old-style).

Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-10-29 08:56:20 +10:30
Rusty Russell
d553ad864e param: fix NULL comparison on oom
kp->arg is always true: it's the contents of that pointer we care about.

Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-10-29 08:56:18 +10:30
Rusty Russell
65afac7d80 param: fix lots of bugs with writing charp params from sysfs, by leaking mem.
e180a6b775 "param: fix charp parameters set via sysfs" fixed the case
where charp parameters written via sysfs were freed, leaving drivers
accessing random memory.

Unfortunately, storing a flag in the kparam struct was a bad idea: it's
rodata so setting it causes an oops on some archs.  But that's not all:

1) module_param_array() on charp doesn't work reliably, since we use an
   uninitialized temporary struct kernel_param.
2) there's a fundamental race if a module uses this parameter and then
   it's changed: they will still access the old, freed, memory.

The simplest fix (ie. for 2.6.32) is to never free the memory.  This
prevents all these problems, at cost of a memory leak.  In practice, there
are only 18 places where a charp is writable via sysfs, and all are
root-only writable.

Reported-by: Takashi Iwai <tiwai@suse.de>
Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-10-29 08:56:17 +10:30
Thomas Gleixner
11df6dddcb futex: Fix spurious wakeup for requeue_pi really
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)

See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>

The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.

Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.

Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28 20:34:34 +01:00
Jiri Kosina
4a6cc4bd32 sched: move rq_weight data array out of .percpu
Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up().

...
ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
ffffffff81043383:       00 00
ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff81043393:       00
ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff8104339b:       00
ffffffff8104339c:       48 01 d0                add    %rdx,%rax
ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
ffffffff810433a6:       00
ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
...

After:

...
ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
ffffffff81043383:       00
ffffffff81043384:       48 98                   cltq
ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
ffffffff8104338d:       00
ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
ffffffff8104339c:       81
ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff810433a4:       00
ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff810433b5:       00
ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
...

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-29 00:26:00 +09:00
Jiri Olsa
6d3f1e12f4 tracing: Remove cpu arg from the rb_time_stamp() function
The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233647.118547500@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:51 +02:00
Jiri Olsa
67b394f7f2 tracing: Fix comment typo and documentation example
Trivial patch to fix a documentation example and to fix a
comment.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.871719877@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:50 +02:00
Jiri Olsa
3e69533b51 tracing: Fix trace_seq_printf() return value
trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:

  trace_seq_printf(s, "");

is perfectly legal, and returning 0 would indicate that it
failed.

We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.

Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.631787612@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:50 +02:00
Jiri Olsa
cf8517cf90 tracing: Update *ppos instead of filp->f_pos
Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:49 +02:00
Mike Galbraith
f685ceacab sched: Strengthen buddies and mitigate buddy induced latencies
This patch restores the effectiveness of LAST_BUDDY in preventing
pgsql+oltp from collapsing due to wakeup preemption. It also
switches LAST_BUDDY to exclusively do what it does best, namely
mitigate the effects of aggressive wakeup preemption, which
improves vmark throughput markedly, and restores mysql+oltp
scalability.

Since buddies are about scalability, enable them beginning at the
point where we begin expanding sched_latency, namely
sched_nr_latency. Previously, buddies were cleared aggressively,
which seriously reduced their effectiveness. Not clearing
aggressively however, produces a small drop in mysql+oltp
throughput immediately after peak, indicating that LAST_BUDDY is
actually doing some harm. This is right at the point where X on the
desktop in competition with another load wants low latency service.
Ergo, do not enable until we need to scale.

To mitigate latency induced by buddies, or by a task just missing
wakeup preemption, check latency at tick time.

Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
CACHE_HOT_BUDDY.

Supporting performance tests:

 tip   = v2.6.32-rc5-1497-ga525b32
 tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
 tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs

(Three run averages except where noted.)

 vmark:
 ------
 tip           108466 messages per second
 tip+          125307 messages per second
 tip+x         125335 messages per second
 tipx          117781 messages per second
 2.6.31.3      122729 messages per second

 mysql+oltp:
 -----------
 clients          1        2        4        8       16       32       64        128    256
 ..........................................................................................
 tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
 tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
 tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
 2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86

 pgsql+oltp:
 -----------
 clients          1        2        4        8       16       32       64      128      256
 ..........................................................................................
 tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
 tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
 tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
 tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
 2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 23:48:28 +02:00
Soeren Sandmann
54f4407608 perf events: Don't generate events for the idle task when exclude_idle is set
Getting samples for the idle task is often not interesting, so
don't generate them when exclude_idle is set for the event in
question.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <ye8pr8fmlq7.fsf@camel16.daimi.au.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 09:35:02 +02:00
Soeren Sandmann
721a669b72 perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
Make the hrtimer based events work for sysprof.

Whenever a swevent is scheduled out, the hrtimer is canceled.
When it is scheduled back in, the timer is restarted. This
happens every scheduler tick, which means the timer never
expired because it was getting repeatedly restarted over and
over with the same period.

To fix that, save the remaining time when disabling; when
reenabling, use that saved time as the period instead of the
user-specified sampling period.

Also, move the starting and stopping of the hrtimers to helper
functions instead of duplicating the code.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 09:35:02 +02:00
Rafael J. Wysocki
04bf7539c0 PM: Make warning in suspend_test_finish() less likely to happen
Increase TEST_SUSPEND_SECONDS to 10 so the warning in
suspend_test_finish() doesn't annoy the users of slower systems so much.

Also, make the warning print the suspend-resume cycle time, so that we
know why the warning actually triggered.

Patch prepared during the hacking session at the Kernel Summit in Tokyo.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:23:45 +09:00
Andi Kleen
65a6446434 HWPOISON: Allow schedule_on_each_cpu() from keventd
Right now when calling schedule_on_each_cpu() from keventd there
is a deadlock because it tries to schedule a work item on the current CPU
too. This happens via lru_add_drain_all() in hwpoison.

Just call the function for the current CPU in this case. This is actually
faster too.

Debugging with Fengguang Wu & Max Asbock

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-19 07:29:22 +02:00
Darren Hart
89061d3d58 futex: Move drop_futex_key_refs out of spinlock'ed region
When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.

Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.

It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.

Originally-From: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <4AD7A298.5040802@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 10:19:18 +02:00
Linus Torvalds
bd0704111e Merge the right tty-fixes branch
* branch 'tty-fixes'
  tty: use the new 'flush_delayed_work()' helper to do ldisc flush
  workqueue: add 'flush_delayed_work()' to run and wait for delayed work
  tty: Make flush_to_ldisc() locking more robust
2009-10-15 14:59:24 -07:00
Paul E. McKenney
237c80c5c8 rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang
If the following sequence of events occurs, then
TREE_PREEMPT_RCU will hang waiting for a grace period to
complete, eventually OOMing the system:

o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
	with more than 64 physical CPUs present (32 on a 32-bit system).
	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
	with RCU_FANOUT set to a sufficiently small value that the
	physical CPUs populate two or more leaf rcu_node structures.

o	A task is preempted in an RCU read-side critical section
	while running on a CPU corresponding to a given leaf rcu_node
	structure.

o	All CPUs corresponding to this same leaf rcu_node structure
	record quiescent states for the current grace period.

o	All of these same CPUs go offline (hence the need for enough
	physical CPUs to populate more than one leaf rcu_node structure).
	This causes the preempted task to be moved to the root rcu_node
	structure.

At this point, there is nothing left to cause the quiescent
state to be propagated up the rcu_node tree, so the current
grace period never completes.

The simplest fix, especially after considering the deadlock
possibilities, is to detect this situation when the last CPU is
offlined, and to set that CPU's ->qsmask bit in its leaf
rcu_node structure.  This will cause the next invocation of
force_quiescent_state() to end the grace period.

Without this fix, this hang can be triggered in an hour or so on
some machines with rcutorture and random CPU onlining/offlining.
With this fix, these same machines pass a full 10 hours of this
sort of abuse.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 20:33:01 +02:00
Paul E. McKenney
019129d595 rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
For the short term, map synchronize_rcu_expedited() to
synchronize_rcu() for TREE_PREEMPT_RCU and to
synchronize_sched_expedited() for TREE_RCU.

Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably
more complex and intrusive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592331-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 11:17:17 +02:00
Paul E. McKenney
37c72e56f6 rcu: Prevent RCU IPI storms in presence of high call_rcu() load
As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has
invoked force_quiescent_state() in the meantime.

This should fix the performance regression reported by Nick.

Reported-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592133-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 11:17:16 +02:00
Linus Torvalds
d6047d79b9 Merge branch 'tty-fixes'
* branch 'tty-fixes':
  tty: use the new 'flush_delayed_work()' helper to do ldisc flush
  workqueue: add 'flush_delayed_work()' to run and wait for delayed work
  Make flush_to_ldisc properly handle parallel calls
2009-10-14 15:34:55 -07:00
Linus Torvalds
ee67e6cbe1 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  oprofile: warn on freeing event buffer too early
  oprofile: fix race condition in event_buffer free
  lockdep: Use cpu_clock() for lockstat
2009-10-14 15:25:35 -07:00
Linus Torvalds
f061d83a2b Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix missing kernel-doc notation
  Revert "x86, timers: Check for pending timers after (device) interrupts"
  sched: Update the clock of runqueue select_task_rq() selected
2009-10-14 15:25:04 -07:00
Linus Torvalds
e345fe1ada Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/filters: Fix memory leak when setting a filter
  tracing: fix trace_vprintk call
2009-10-14 15:24:51 -07:00
Linus Torvalds
8c53e46314 workqueue: add 'flush_delayed_work()' to run and wait for delayed work
It basically turns a delayed work into an immediate work, and then waits
for it to finish, thus allowing you to force (and wait for) an immediate
flush of a delayed work.

We'll want to use this in the tty layer to clean up tty_flush_to_ldisc().

Acked-by: Oleg Nesterov <oleg@redhat.com>
[ Fixed to use 'del_timer_sync()' as noted by Oleg ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-14 15:11:35 -07:00
Darren Hart
2bc872036e futex: Check for NULL keys in match_futex
If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
it will find the futex_q->requeue_pi_key to be NULL and OOPS.

Check for NULL in match_futex() instead of doing explicit NULL pointer
checks on all call sites.  While match_futex(NULL, NULL) returning
false is a little odd, it's still correct as we expect valid key
references.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <4AD60687.10306@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 22:00:14 +02:00
Linus Torvalds
43046b6066 workqueue: add 'flush_delayed_work()' to run and wait for delayed work
It basically turns a delayed work into an immediate work, and then waits
for it to finish.
2009-10-14 09:16:42 -07:00
Peter Zijlstra
92f6a5e37a sched: Do less agressive buddy clearing
Yanmin reported a hackbench regression due to:

 > commit de69a80be3
 > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
 > Date:   Thu Sep 17 09:01:20 2009 +0200
 >
 >     sched: Stop buddies from hogging the system

I really liked de69a80b, and it affecting hackbench shows I wasn't
crazy ;-)

So hackbench is a multi-cast, with one sender spraying multiple
receivers, who in their turn don't spray back.

This would be exactly the scenario that patch 'cures'. Previously
we would not clear the last buddy after running the next task,
allowing the sender to get back to work sooner than it otherwise
ought to have been, increasing latencies for other tasks.

Now, since those receivers don't poke back, they don't enforce the
buddy relation, which means there's nothing to re-elect the sender.

Cure this by less agressively clearing the buddy stats. Only clear
buddies when they were not chosen. It should still avoid a buddy
sticking around long after its served its time.

Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1255084986.8802.46.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 15:02:34 +02:00
Paul Mackerras
03541f8b69 perf_event: Adjust frequency and unthrottle for non-group-leader events
The loop in perf_ctx_adjust_freq checks the frequency of sampling
event counters, and adjusts the event interval and unthrottles the
event if required, and resets the interrupt count for the event.
However, at present it only looks at group leaders.

This means that a sampling event that is not a group leader will
eventually get throttled, once its interrupt count reaches
sysctl_perf_event_sample_rate/HZ --- and that is guaranteed to
happen, if the event is active for long enough, since the interrupt
count never gets reset.  Once it is throttled it never gets
unthrottled, so it basically just stops working at that point.

This fixes it by making perf_ctx_adjust_freq use ctx->event_list
rather than ctx->group_list.  The existing spin_lock/spin_unlock
around the loop makes it unnecessary to put rcu_read_lock/
rcu_read_unlock around the list_for_each_entry_rcu().

Reported-by: Mark W. Krentel <krentel@cs.rice.edu>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19157.26731.855609.165622@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 08:39:32 +02:00
Thomas Gleixner
d58e6576b0 futex: Handle spurious wake up
The futex code does not handle spurious wake up in futex_wait and
futex_wait_requeue_pi.

The code assumes that any wake up which was not caused by futex_wake /
requeue or by a timeout was caused by a signal wake up and returns one
of the syscall restart error codes.

In case of a spurious wake up the signal delivery code which deals
with the restart error codes is not invoked and we return that error
code to user space. That causes applications which actually check the
return codes to fail. Blaise reported that on preempt-rt a python test
program run into a exception trap. -rt exposed that due to a built in
spurious wake up accelerator :)

Solve this by checking signal_pending(current) in the wake up path and
handle the spurious wake up case w/o returning to user space.

Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
LKML-Reference: <new-submission>
2009-10-13 20:40:43 +02:00
Linus Torvalds
80f506918f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cciss: Add cciss_allow_hpsa module parameter
  cciss: Fix multiple calls to pci_release_regions
  blk-settings: fix function parameter kernel-doc notation
  writeback: kill space in debugfs item name
  writeback: account IO throttling wait as iowait
  elv_iosched_store(): fix strstrip() misuse
  cfq-iosched: avoid probable slice overrun when idling
  cfq-iosched: apply bool value where we return 0/1
  cfq-iosched: fix think time allowed for seekers
  cfq-iosched: fix the slice residual sign
  cfq-iosched: abstract out the 'may this cfqq dispatch' logic
  block: use proper BLK_RW_ASYNC in blk_queue_start_tag()
  block: Seperate read and write statistics of in_flight requests v2
  block: get rid of kblock_schedule_delayed_work()
  cfq-iosched: fix possible problem with jiffies wraparound
  cfq-iosched: fix issue with rq-rq merging and fifo list ordering
2009-10-13 10:21:33 -07:00
Li Zefan
8ad807318f tracing/filters: Fix memory leak when setting a filter
Every time we set a filter, we leak memory allocated by
postfix_append_operand() and postfix_append_op().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: <stable@kernel.org> # for v2.6.31.x
LKML-Reference: <4AD3D7D9.4070400@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 08:05:17 +02:00
Randy Dunlap
e17b38bf9e sched: Fix missing kernel-doc notation
The following htmldocs warnings:

  Warning(kernel/sched.c:685): No description found for parameter 'cpu'
  Warning(kernel/sched.c:3676): No description found for parameter 'sd'

Trigger because new parameters were added to update_rq_clock() and
update_group_power() without updating the kernel-doc notation.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4AD29070.7070002@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 10:50:06 +02:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Mike Galbraith
f5dc37530b sched: Update the clock of runqueue select_task_rq() selected
In try_to_wake_up(), we update the runqueue clock, but
select_task_rq() may select a different runqueue than the one we
updated, leaving the new runqueue's clock stale for a bit.

This patch cures occasional huge latencies reported by latencytop
when coming out of idle on a mostly idle NO_HZ box.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1255070103.7639.30.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:58:11 +02:00
Peter Zijlstra
3365e77987 lockdep: Use cpu_clock() for lockstat
Some tracepoint magic (TRACE_EVENT(lock_acquired)) relies on
the fact that lock hold times are positive and uses div64 on
that. That triggered a build warning on MIPS, and probably
causes bad output in certain circumstances as well.

Make it truly positive.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1254818502.21044.112.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:56:44 +02:00
Wu Fengguang
d25105e891 writeback: account IO throttling wait as iowait
It makes sense to do IOWAIT when someone is blocked
due to IO throttle, as suggested by Kame and Peter.

There is an old comment for not doing IOWAIT on throttle,
however it has been mismatching the code for a long time.

If we stop accounting IOWAIT for 2.6.32, it could be an
undesirable behavior change. So restore the io_schedule.

CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-09 12:40:42 +02:00
Steven Rostedt
a813a15976 tracing: fix trace_vprintk call
The addition of trace_array_{v}printk used the wrong function for
trace_vprintk to call. This broke trace_marker and trace_vprintk
itself. Although trace_printk may not have been affected by those
that end up calling trace_vbprintk.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-09 01:41:35 -04:00
Linus Torvalds
f579bbcd9b Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: fix requeue_pi key imbalance
  futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
  rcu: Place root rcu_node structure in separate lockdep class
  rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
  rcu: Move rcu_barrier() to rcutree
  futex: Move exit_pi_state() call to release_mm()
  futex: Nullify robust lists after cleanup
  futex: Fix locking imbalance
  panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
  rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
  rcu: Clean up code based on review feedback from Josh Triplett, part 4
  rcu: Clean up code based on review feedback from Josh Triplett, part 3
  rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
  rcu: Clean up code to address Ingo's checkpatch feedback
  rcu: Clean up code based on review feedback from Josh Triplett, part 2
  rcu: Clean up code based on review feedback from Josh Triplett
2009-10-08 12:16:35 -07:00