Commit Graph

13217 Commits

Author SHA1 Message Date
Peter Zijlstra
7490d0a4cf sched/nohz: Rewrite and fix load-avg computation -- again
commit 5167e8d541 upstream.

Thanks to Charles Wang for spotting the defects in the current code:

 - If we go idle during the sample window -- after sampling, we get a
   negative bias because we can negate our own sample.

 - If we wake up during the sample window we get a positive bias
   because we push the sample to a known active period.

So rewrite the entire nohz load-avg muck once again, now adding
copious documentation to the code.

Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
Reported-and-tested-by: Charles Wang <muming.wq@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:56 -07:00
Eric Dumazet
2c07f25ea7 splice: fix racy pipe->buffers uses
commit 047fe36052 upstream.

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbb (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.2:
 - Adjust context in vmsplice_to_pipe()
 - Update one more call to splice_shrink_spd(), from skb_splice_bits()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 09:04:42 -07:00
Vaibhav Nagarnaik
4943d9cb7d tracing: change CPU ring buffer state from tracing_cpumask
commit 71babb2705 upstream.

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 09:04:40 -07:00
Konstantin Khlebnikov
21017faf87 mm: correctly synchronize rss-counters at exit/exec
commit 4fe7efdbdf upstream.

do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task->rss_stat and thus make
mm->rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 09:04:08 -07:00
Richard Cochran
c17f648b6e ntp: Correct TAI offset during leap second
commit dd48d708ff upstream.

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:37:16 -07:00
Seiji Aguchi
d64cbbc960 kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
commit 62be73eafa upstream.

This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(),
to serialize the crash-logging process via smp_send_stop() and to
thus retrieve a more stable crash image of all CPUs stopped.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: dle-develop@lists.sourceforge.net <dle-develop@lists.sourceforge.net>
Cc: Satoru Moriya <satoru.moriya@hds.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: a.p.zijlstra@chello.nl <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5C4C569E8A4B9B42A84A977CF070A35B2E4D7A5CE2@USINDEVS01.corp.hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:36:56 -07:00
Steven Rostedt
3993b24649 tracing: Have tracing_off() actually turn tracing off
commit f2bf1f6f5f upstream.

A recent update to have tracing_on/off() only affect the ftrace ring
buffers instead of all ring buffers had a cut and paste error.
The tracing_off() did the exact same thing as tracing_on() and
would not actually turn off tracing. Unfortunately, tracing_off()
is more important to be working than tracing_on() as this is a key
development tool, as it lets the developer turn off tracing as soon
as a problem is discovered. It is also used by panic and oops code.

This bug also breaks the 'echo func:traceoff > set_ftrace_filter'

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:36:55 -07:00
Dimitri Sivanich
8997b2223b sched: Fix the relax_domain_level boot parameter
commit a841f8cef4 upstream.

It does not get processed because sched_domain_level_max is 0 at the
time that setup_relax_domain_level() is run.

Simply accept the value as it is, as we don't know the value of
sched_domain_level_max until sched domain construction is completed.

Fix sched_relax_domain_level in cpuset.  The build_sched_domain() routine calls
the set_domain_attribute() routine prior to setting the sd->level, however,
the set_domain_attribute() routine relies on the sd->level to decide whether
idle load balancing will be off/on.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17 11:21:28 -07:00
John Stultz
d913c02b0a timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
commit fad0c66c4b upstream.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17 11:21:23 -07:00
Siddhesh Poyarekar
3876c72231 mm/fork: fix overflow in vma length when copying mmap on clone
commit 7edc8b0ac1 upstream.

The vma length in dup_mmap is calculated and stored in a unsigned int,
which is insufficient and hence overflows for very large maps (beyond
16TB). The following program demonstrates this:

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

#define GIG 1024 * 1024 * 1024L
#define EXTENT 16393

int main(void)
{
        int i, r;
        void *m;
        char buf[1024];

        for (i = 0; i < EXTENT; i++) {
                m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L,
                         PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

                if (m == (void *)-1)
                        printf("MMAP Failed: %d\n", m);
                else
                        printf("%d : MMAP returned %p\n", i, m);

                r = fork();

                if (r == 0) {
                        printf("%d: successed\n", i);
                        return 0;
                } else if (r < 0)
                        printf("FORK Failed: %d\n", r);
                else if (r > 0)
                        wait(NULL);
        }
        return 0;
}

Increase the storage size of the result to unsigned long, which is
sufficient for storing the difference between addresses.

Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:36:06 +09:00
Tejun Heo
24312d34c9 workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active
commit 544ecf310f upstream.

worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
isn't zero when every worker is idle.  This can trigger spuriously
while a cpu is going down due to the way trustee sets %WORKER_ROGUE
and zaps nr_running.

It first sets %WORKER_ROGUE on all workers without updating
nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
then zaps nr_running.  If the last running worker enters idle
inbetween, it would see stale nr_running which hasn't been zapped yet
and trigger the WARN_ON_ONCE().

Fix it by performing the sanity check iff the trustee is idle.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01 15:18:19 +08:00
Linus Torvalds
31ae98359d Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf, x86 and scheduler updates from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing: Do not enable function event with enable
  perf stat: handle ENXIO error for perf_event_open
  perf: Turn off compiler warnings for flex and bison generated files
  perf stat: Fix case where guest/host monitoring is not supported by kernel
  perf build-id: Fix filename size calculation

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
  x86: Fix section annotation of acpi_map_cpu2node()
  x86/microcode: Ensure that module is only loaded on supported Intel CPUs

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-17 09:35:17 -07:00
Jiri Kosina
3911ff30f5 genirq: export handle_edge_irq() and irq_to_desc()
Export handle_edge_irq() and irq_to_desc() to modules to allow them to
do things such as

	__irq_set_handler_locked(...., handle_edge_irq);

This fixes

	ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined!
	ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined!

when gpio-pch is being built as a module.

This was introduced by commit df9541a60a ("gpio: pch9: Use proper flow
type handlers") that added

	__irq_set_handler_locked(d->irq, handle_edge_irq);

but handle_edge_irq() was not exported for modules (and inlined
__irq_set_handler_locked() requires irq_to_desc() exported as well)

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-15 08:10:07 -07:00
Mike Galbraith
5e2bf01422 namespaces, pid_ns: fix leakage on fork() failure
Fork() failure post namespace creation for a child cloned with
CLONE_NEWPID leaks pid_namespace/mnt_cache due to proc being mounted
during creation, but not unmounted during cleanup.  Call
pid_ns_release_proc() during cleanup.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Louis Rilling <louis.rilling@kerlabs.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 15:06:44 -07:00
Steven Rostedt
9b63776fa3 tracing: Do not enable function event with enable
With the adding of function tracing event to perf, it caused a
side effect that produces the following warning when enabling all
events in ftrace:

 # echo 1 > /sys/kernel/debug/tracing/events/enable

[console]
event trace: Could not enable event function

This is because when enabling all events via the debugfs system
it ignores events that do not have a ->reg() function assigned.
This was to skip over the ftrace internal events (as they are
not TRACE_EVENTs). But as the ftrace function event now has
a ->reg() function attached to it for use with perf, it is no
longer ignored.

Worse yet, this ->reg() function is being called when it should
not be. It returns an error and causes the above warning to
be printed.

By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
and have all ftrace internel event structures have it set,
setting the events/enable will no longe try to incorrectly enable
the function event and does not warn.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-10 15:55:43 -04:00
Jan Kiszka
b7dafa0ef3 compat: Fix RT signal mask corruption via sigprocmask
compat_sys_sigprocmask reads a smaller signal mask from userspace than
sigprogmask accepts for setting.  So the high word of blocked.sig[0]
will be cleared, releasing any potentially blocked RT signal.

This was discovered via userspace code that relies on get/setcontext.
glibc's i386 versions of those functions use sigprogmask instead of
rt_sigprogmask to save/restore signal mask and caused RT signal
unblocking this way.

As suggested by Linus, this replaces the sys_sigprocmask based compat
version with one that open-codes the required logic, including the merge
of the existing blocked set with the new one provided on SIG_SETMASK.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 08:58:33 -07:00
Igor Mammedov
30b4e9eb78 sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
If we have one cpu that failed to boot and boot cpu gave up on
waiting for it and then another cpu is being booted, kernel
might crash with following OOPS:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
   IP: [<ffffffff812c3630>] __bitmap_weight+0x30/0x80
   Call Trace:
       [<ffffffff8108b9b6>] build_sched_domains+0x7b6/0xa50

The crash happens in init_sched_groups_power() that expects
sched_groups to be circular linked list. However it is not
always true, since sched_groups preallocated in __sdt_alloc are
initialized in build_sched_groups and it may exit early

        if (cpu != cpumask_first(sched_domain_span(sd)))
                return 0;

without initializing sd->groups->next field.

Fix bug by initializing next field right after sched_group was
allocated.

Also-Reported-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: pjt@google.com
Cc: seto.hidetoshi@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1336559908-32533-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 12:27:35 +02:00
Linus Torvalds
6cfdd02b88 Power management fixes for 3.4
Fix for an issue causing hibernation to hang on systems with highmem (that
 practically means i386) due to broken memory management (bug introduced in 3.2,
 so -stable material) and PM documentation update making the freezer
 documentation follow the code again after some recent updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPnbeLAAoJEKhOf7ml8uNs4nkQAKwhfWWfbM7ZepPfT56A5NW1
 9vlfgO+1ibUgdjkO0hi1biCAbARTNVS5eCLyRJW0W/msGgL51nYleBeFmwwx5E6h
 m5Vwr/53cGeAeF0AOkrQkD45YKJaAlTmWF4/T2YWKLWNMgaVuLmGf7eyYZ6rP1NO
 rJxzUOMC6UrRjIA+S2anDU0CdMyqDHvV3OmY+InZBikFCk0YAtDYUYfNDNqQpEBG
 bzkG3SyaJeqnbQDkhme7U/uAPJCThSz2Z4gvvOxiXdB+I+yp6FhluhLSGxqMh/kj
 OUAJe9s6AAdKz+K62/OgowwucxvmeJRCyYWkN2ZEpsZLoqTEOqLNS4+eaUO6xS/2
 tq89LnfSIVFwRx23XeVr/oMfxUJZ8VKZENo5Pm6NjTAYykTeyD4ug/GAHqgXR0TT
 B+fvx8QmQ68R843aJsjR9h0AKsSeXfgCAROJt+x0ONYAvmJNV62nzs81broDEl4I
 BmWHpOWI7wlzMPt7bNWn4ev4K+WhbVsioFDS61he0Y47Rqt3yUJ8G2OfBq6JYndw
 As4ImoPOVGl0+TKcHJ9Y3bVPnsY7fJyF0GeG50NHxsVFsnTv+rYZ9K3GM9KExhO/
 5mCfoHNgkOJnhGHfZppbnQBHbmjH8EA3QUx57Abo+Q4wiPNNAVG9P1JpZeyGj8KF
 3YML5FjjGQHtYBWeH5WR
 =HaVL
 -----END PGP SIGNATURE-----

Merge tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael J. Wysocki:
 "Fix for an issue causing hibernation to hang on systems with highmem
  (that practically means i386) due to broken memory management (bug
  introduced in 3.2, so -stable material) and PM documentation update
  making the freezer documentation follow the code again after some
  recent updates."

* tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Freezer / Docs: Update documentation about freezing of tasks
  PM / Hibernate: fix the number of pages used for hibernate/thaw buffering
2012-04-29 15:00:44 -07:00
Linus Torvalds
78e97a4788 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fix from Ingo Molnar.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Permit call_rcu() from CPU_DYING notifiers
2012-04-27 19:40:56 -07:00
Linus Torvalds
daae677f56 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix OOPS when build_sched_domains() percpu allocation fails
  sched: Fix more load-balancing fallout
2012-04-27 19:37:00 -07:00
Linus Torvalds
06fc5d3d24 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix perf_event_for_each() to use sibling
  perf symbols: Read plt symbols from proper symtab_type binary
  tracing: Fix stacktrace of latency tracers (irqsoff and friends)
  perf tools: Add 'G' and 'H' modifiers to event parsing
  tracing: Fix regression with tracing_on
  perf tools: Drop CROSS_COMPILE from flex and bison calls
  perf report: Fix crash showing warning related to kernel maps
  tracing: Fix build breakage without CONFIG_PERF_EVENTS (again)
2012-04-27 19:35:50 -07:00
Linus Torvalds
f6072452c9 Merge branch 'for-v3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull build fixes for less mainstream architectures from Paul Gortmaker:
 "These are fixes for frv(1), blackfin(2), powerpc(1) and xtensa(4).

  Fortunately the touches are nearly all specific to files just used by
  the arch in question.  The two touches to shared/common files
  [kernel/irq/debug.h and drivers/pci/Makefile] are trivial to assess as
  no risk to anyone.

  Half of them relate to xtensa directly.  It was only when I fixed the
  last xtensa issue that I realized that the arch has been broken for a
  significant time, and isn't a specific v3.4 regression.  So if you
  wanted, we could leave xtensa lying bleeding in the street for a
  couple more weeks and queue those for 3.5.  But given they are no risk
  to anyone outside of xtensa, I figured to just leave them in.

  If you are OK with taking the xtensa fixes, then please pull to get:

   - one last implicit include uncovered by system.h that is in a file
     specific to just one powerpc defconfig.  (I'd sync'd with BenH).

   - fix an oversight in the PCI makefile where shared code wasn't being
     compiled for ARCH=frv

   - fix a missing include for GPIO in blackfin framebuffer.

   - audit and tag endif in blackfin ezkit board file, in order to find
     and fix the misplaced endif masking a block of code.

   - fix irq/debug.h choice of temporary macro names to be more internal
     so they don't conflict with names used by xtensa.

   - fix a reference to an undeclared local var in xtensa's signal.c

   - fix an implicit bug.h usage in xtensa's asm/io.h uncovered by my
     removing bug.h from kernel.h

   - fix xtensa to properly indicate it is using asm-generic/hardirq.h
     in order to resolve the link error - undefined ack_bad_irq

  The xtensa still fails final link as my latest binutils does something
  evil when ld forward-relocates unlikely() blocks, but in theory people
  who have older/valid toolchains could now use the thing."

* 'for-v3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  xtensa: fix build fail on undefined ack_bad_irq
  blackfin: fix ifdef fustercluck in mach-bf538/boards/ezkit.c
  blackfin: fix compile error in bfin-lq035q1-fb.c
  pci: frv architecture needs generic setup-bus infrastructure
  irq: hide debug macros so they don't collide with others.
  xtensa: fix build error in xtensa/include/asm/io.h
  xtensa: fix build failure in xtensa/kernel/signal.c
  powerpc: fix system.h fallout in sysdev/scom.c [chroma_defconfig]
2012-04-27 19:32:37 -07:00
Michael Ellerman
724b6daa13 perf: Fix perf_event_for_each() to use sibling
In perf_event_for_each() we call a function on an event, and then
iterate over the siblings of the event.

However we don't call the function on the siblings, we call it
repeatedly on the original event - it seems "obvious" that we should
be calling it with sibling as the argument.

It looks like this broke in commit 75f937f24b ("Fix ctx->mutex
vs counter->mutex inversion").

The only effect of the bug is that the PERF_IOC_FLAG_GROUP parameter
to the ioctls doesn't work.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334109253-31329-1-git-send-email-michael@ellerman.id.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 13:51:31 +02:00
he, bo
fb2cf2c660 sched: Fix OOPS when build_sched_domains() percpu allocation fails
Under extreme memory used up situations, percpu allocation
might fail. We hit it when system goes to suspend-to-ram,
causing a kworker panic:

 EIP: [<c124411a>] build_sched_domains+0x23a/0xad0
 Kernel panic - not syncing: Fatal exception
 Pid: 3026, comm: kworker/u:3
 3.0.8-137473-gf42fbef #1

 Call Trace:
  [<c18cc4f2>] panic+0x66/0x16c
  [...]
  [<c1244c37>] partition_sched_domains+0x287/0x4b0
  [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210
  [<c123712d>] cpuset_cpu_inactive+0x1d/0x30
  [...]

With this fix applied build_sched_domains() will return -ENOMEM and
the suspend attempt fails.

Signed-off-by: he, bo <bo.he@intel.com>
Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo
[ So, we fail to deallocate a CPU because we cannot allocate RAM :-/
  I don't like that kind of sad behavior but nevertheless it should
  not crash under high memory load. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 12:54:53 +02:00
Peter Zijlstra
eb95308ee2 sched: Fix more load-balancing fallout
Commits 367456c756 ("sched: Ditch per cgroup task lists for
load-balancing") and 5d6523ebd ("sched: Fix load-balance wreckage")
left some more wreckage.

By setting loop_max unconditionally to ->nr_running load-balancing
could take a lot of time on very long runqueues (hackbench!). So keep
the sysctl as max limit of the amount of tasks we'll iterate.

Furthermore, the min load filter for migration completely fails with
cgroups since inequality in per-cpu state can easily lead to such
small loads :/

Furthermore the change to add new tasks to the tail of the queue
instead of the head seems to have some effect.. not quite sure I
understand why.

Combined these fixes solve the huge hackbench regression reported by
Tim when hackbench is ran in a cgroup.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1335365763.28150.267.camel@twins
[ got rid of the CONFIG_PREEMPT tuning and made small readability edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 12:54:52 +02:00
Ingo Molnar
c716ef56f1 Merge branch 'tip/perf/urgent-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent 2012-04-25 12:33:24 +02:00
Ingo Molnar
4d8cd7e780 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent 2012-04-25 09:42:49 +02:00
Bojan Smojver
f8262d4768 PM / Hibernate: fix the number of pages used for hibernate/thaw buffering
Hibernation regression fix, since 3.2.

Calculate the number of required free pages based on non-high memory
pages only, because that is where the buffers will come from.

Commit 081a9d043c introduced a new buffer
page allocation logic during hibernation, in order to improve the
performance. The amount of pages allocated was calculated based on total
amount of pages available, although only non-high memory pages are
usable for this purpose. This caused hibernation code to attempt to over
allocate pages on platforms that have high memory, which led to hangs.

Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
2012-04-24 23:53:28 +02:00
Paul Gortmaker
9f3045eca8 irq: hide debug macros so they don't collide with others.
The file kernel/irq/debug.h temporarily defines P, PS, PD
and then undefines them.  However these names aren't really
"internal" enough, and collide with other more legit users
such as the ones in the xtensa arch, causing:

In file included from kernel/irq/internals.h:58:0,
                 from kernel/irq/irqdesc.c:18:
kernel/irq/debug.h:8:0: warning: "PS" redefined [enabled by default]
arch/xtensa/include/asm/regs.h:59:0: note: this is the location of the previous definition

Add a handful of underscores to do a better job of hiding these
temporary macros.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-23 12:30:03 -04:00
Steven Rostedt
db4c75cbeb tracing: Fix stacktrace of latency tracers (irqsoff and friends)
While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC),
we discovered that the stacktrace output of the latency tracers
(preemptirqsoff) was empty.

This bug was caused by the creation of the dynamic length stack trace
again (like commit 12b5da3 "tracing: Fix ent_size in trace output" was).

This bug is caused by the latency tracers requiring the next event
to determine the time between the current event and the next. But by
grabbing the next event, the iter->ent_size is set to the next event
instead of the current one. As the stacktrace event is the last event,
this makes the ent_size zero and causes nothing to be printed for
the stack trace. The dynamic stacktrace uses the ent_size to determine
how much of the stack can be printed. The ent_size of zero means
no stack.

The simple fix is to save the iter->ent_size before finding the next event.

Note, mirage335 asked to remain anonymous from LKML and git, so I will
not add the Reported-by and Tested-by tags, even though he did report
the issue and tested the fix.

Cc: stable@vger.kernel.org # 3.1+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-19 17:00:13 -04:00
Suresh Siddha
a6371f8023 tick: Fix the spurious broadcast timer ticks after resume
During resume, tick_resume_broadcast() programs the broadcast timer in
oneshot mode unconditionally. On the platforms where broadcast timer
is not really required, this will generate spurious broadcast timer
ticks upon resume. For example, on the always running apic timer
platforms with HPET, I see spurious hpet tick once every ~5minutes
(which is the 32-bit hpet counter wraparound time).

Similar to boot time, during resume make the oneshot mode setting of
the broadcast clock event device conditional on the state of active
broadcast users.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: svenjoac@gmx.de
Cc: torvalds@linux-foundation.org
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-19 21:27:50 +02:00
Thomas Gleixner
b9a6a23566 tick: Ensure that the broadcast device is initialized
Santosh found another trap when we avoid to initialize the broadcast
device in the switch_to_oneshot code. The broadcast device might be
still in SHUTDOWN state when we actually need to use it. That
obviously breaks, as set_next_event() is called on a shutdown
device. This did not break on x86, but Suresh analyzed it:

From the review, most likely on Sven's system we are force enabling
the hpet using the pci quirk's method very late. And in this case,
hpet_clockevent (which will be global_clock_event) handler can be
null, specifically as this platform might not be using deeper c-states
and using the reliable APIC timer.

Prior to commit 'fa4da365bc7772c', that handler will be set to
'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
oneshot mode, even though we don't use it. Post commit
'fa4da365bc7772c', we stopped switching the broadcast mode to oneshot
as this is not really needed and his platform's global_clock_event's
handler will remain null. While on my SNB laptop, same is set to
'clockevents_handle_noop' because hpet gets enabled very early. (noop
handler on my platform set when the early enabled hpet timer gets
replaced by the lapic timer).

But the commit 'fa4da365bc7772c' tracked the broadcast timer mode in
the SW as oneshot, even though it didn't touch the HW timer. During
resume however, tick_resume_broadcast() saw the SW broadcast mode as
oneshot and actually programmed the broadcast device also into oneshot
mode. So this triggered the null pointer de-reference after the hpet
wraps around and depending on what the hpet counter is set to. On the
normal platforms where hpet gets enabled early we should be seeing a
spurious interrupt (in my SNB laptop I see one spurious interrupt
after around 5 minutes ;) which is 32-bit hpet counter wraparound
time), but that's a separate issue.

Enforce the mode setting when trying to set an event.

Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: torvalds@linux-foundation.org
Cc: svenjoac@gmx.de
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos
2012-04-19 21:27:35 +02:00
Thomas Gleixner
b435092f70 tick: Fix oneshot broadcast setup really
Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
pointer dereference. Linus spotted the clockevent handler being NULL.

commit fa4da365b(clockevents: tTack broadcast device mode change in
tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
broadcast device setup, which was introduced in commit 77b0d60c5(
clockevents: Leave the broadcast device in shutdown mode when not
needed).

The initial commit avoided to set up the broadcast device when no
broadcast request bits were set, but that left the broadcast device
disfunctional. In consequence deep idle states which need the
broadcast device were not woken up.

commit fa4da365b tried to fix that by initializing the state of the
broadcast facility, but that missed the fact, that nothing initializes
the event handler and some other state of the underlying clock event
device.

The fix is to revert both commits and make only the mode setting of
the clock event device conditional on the state of active broadcast
users. 

That initializes everything except the low level device mode, but this
happens when the broadcast functionality is invoked by deep idle.

Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos
2012-04-18 14:00:56 +02:00
Paul E. McKenney
92c38702e9 rcu: Permit call_rcu() from CPU_DYING notifiers
As of:

  29494be71a ("rcu,cleanup: simplify the code when cpu is dying")

RCU adopts callbacks from the dying CPU in its CPU_DYING notifier,
which means that any callbacks posted by later CPU_DYING notifiers
are ignored until the CPU comes back online.

A WARN_ON_ONCE() was added to __call_rcu() by:

  e560140008 ("rcu: Simplify offline processing")

to check for this condition.  Although this condition did not trigger
(at least as far as I know) during -next testing, it did recently
trigger in mainline:

  https://lkml.org/lkml/2012/4/2/34

What is needed longer term is for RCU's CPU_DEAD notifier to adopt any
callbacks that were posted by CPU_DYING notifiers, however, the Linux
kernel has been running with this sort of thing happening for quite
some time.  So the only thing that qualifies as a regression is the
WARN_ON_ONCE(), which this commit removes.

Making RCU's CPU_DEAD notifier adopt callbacks posted by CPU_DYING
notifiers is a topic for the 3.5 release of the Linux kernel.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-17 07:30:54 -07:00
Steven Rostedt
348f0fc238 tracing: Fix regression with tracing_on
The change to make tracing_on affect only the ftrace ring buffer, caused
a bug where it wont affect any ring buffer. The problem was that the buffer
of the trace_array was passed to the write function and not the trace array
itself.

The trace_array can change the buffer when running a latency tracer. If this
happens, then the buffer being disabled may not be the buffer currently used
by ftrace. This will cause the tracing_on file to become useless.

The simple fix is to pass the trace_array to the write function instead of
the buffer. Then the actual buffer may be changed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-16 15:41:28 -04:00
Linus Torvalds
668ce0ac70 Merge branch 'systemh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull system.h fixups for less common arch's from Paul Gortmaker:
 "Here is what is hopefully the last of the system.h related fixups.

  The fixes for Alpha and ia64 are code relocations consistent with what
  was done for the more mainstream architectures.  Note that the
  diffstat lines removed vs lines added are not the same since I've
  fixed some of the whitespace issues in the relocated code blocks.
  However they are functionally the same.  Compile tested locally, plus
  these two have been in linux-next for a while.

  There is also a trivial one line system.h related fix for the Tilera
  arch from Chris Metcalf to fix an implict include.."

* 'systemh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  irq_work: fix compile failure on tile from missing include
  ia64: populate the cmpxchg header with appropriate code
  alpha: fix build failures from system.h dismemberment
2012-04-13 19:44:36 -07:00
Mark Brown
6e48b550d1 tracing: Fix build breakage without CONFIG_PERF_EVENTS (again)
Today's -next fails to link for me:

kernel/built-in.o:(.data+0x178e50): undefined reference to `perf_ftrace_event_register'

It looks like multiple fixes have been merged for the issue fixed by
commit fa73dc9 (tracing: Fix build breakage without CONFIG_PERF_EVENTS)
though I can't identify the other changes that have gone in at the
minute, it's possible that the changes which caused the breakage fixed
by the previous commit got dropped but the fix made it in.

Link: http://lkml.kernel.org/r/1334307179-21255-1-git-send-email-broonie@opensource.wolfsonmicro.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-13 21:37:04 -04:00
Chris Metcalf
ef1f098254 irq_work: fix compile failure on tile from missing include
Building with IRQ_WORK configured results in

kernel/irq_work.c: In function ‘irq_work_run’:
kernel/irq_work.c:110: error: implicit declaration of function ‘irqs_disabled’

The appropriate header just needs to be included.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-13 13:15:16 -04:00
Linus Torvalds
b3dfd76c94 irqdomain bug fixes for v3.4-rc3
Format string bug fix for irqdomain debug output on 64 bit platforms
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPh1bxAAoJEEFnBt12D9kBbJYP/1QVBjTuObbdoI4UQ8TMTueO
 6Wh0hZ0zxRF+lznPJKJQdurIKJBtgM2m+M+HZl1fIrhIQbwzASc3whztqc30n1rj
 qnqjzeGAdQv8NWvABgjZJM0s8SuCFlwvnm0BfdXGwe4Uh/E761rs3oz0YtZcXUC8
 XXiWjY6FNExZ8dKFv94SDmFS8FGjz72gQW5rGB8wtyD/sl7rs59K6h2eOBm5HhUT
 DDjsIlyUGev7QYMJNFRfYDEFKBXFH63v1q69kroOxEgd2CMwD2WfAguwBdFKhOrF
 aWfOUJZaOkglGOfeGulEs0lohgWeehSZYwKNTDZh/FPqQmSXhixN9PIc4iYBXlqa
 ZgyUF3Tt3BQ+s8rHNTk1psWxvzYvHzfA6+KGRdPmZ9fOmqdfCoAfj2wh5oWmSKsJ
 ZwQygeU1ziI/deBRVL08qW1NeeYFf3iGY68wIV338XGBmMYxpWwzWjoWO3nKkSxm
 nMUiiOyEVulLdzeXy+JCL39IbZ8atiDj/012CIsiHhssZRtoQNt2wyBxZ4yditze
 6gZWtak1dn9ZAIZiGfzPh5SbbPOGjykqt0VSoyhKU1XEVAsGByWwZvqLFWwMRSeD
 KIKp7zINy5p/ftoBhe3dKgDNw83FJF+IqubK5k6m/AtDY14WOoUbVIkfbxXhhXLK
 Di5uxHBxhRIL5jhj1v27
 =7g0b
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull a fix for the recent irqdomain bug fixes from Grant Likely:
 "I flubbed one patch in the last pull request which broke a format
  string on 64 bit platforms.  Here's the fix."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  irq_domain: fix type mismatch in debugfs output format
2012-04-12 15:33:16 -07:00
Grant Likely
5269a9ab7d irq_domain: fix type mismatch in debugfs output format
sizeof(void*) returns an unsigned long, but it was being used as a width parameter to a "%-*s" format string which requires an int.  On 64 bit platforms this causes a type mismatch:

    linux/kernel/irq/irqdomain.c:575: warning: field width should have type
    'int', but argument 6 has type 'long unsigned int'

This change casts the size to an int so printf gets the right data type.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
2012-04-12 16:25:48 -06:00
Linus Torvalds
ccb1ec95e9 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "The itimer removal one is not strictly a fix, but I really wanted to
  avoid a rebase of the urgent ones."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "clocksource: Load the ACPI PM clocksource asynchronously"
  clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()
  itimer: Use printk_once instead of WARN_ONCE
  nohz: Fix stale jiffies update in tick_nohz_restart()
  tick: Document TICK_ONESHOT config option
  proc: stats: Use arch_idle_time for idle and iowait times if available
  itimer: Schedule silent NULL pointer fixup in setitimer() for removal
2012-04-12 15:16:26 -07:00
Linus Torvalds
ecca5c3acc Merge branch 'akpm' (Andrew's patch-bomb)
Merge fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (14 patches)
  panic: fix stack dump print on direct call to panic()
  drivers/rtc/rtc-pl031.c: enable clock on all ST variants
  Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"
  hugetlb: fix race condition in hugetlb_fault()
  drivers/rtc/rtc-twl.c: use static register while reading time
  drivers/rtc/rtc-s3c.c: add placeholder for driver private data
  drivers/rtc/rtc-s3c.c: fix compilation error
  MAINTAINERS: add PCDP console maintainer
  memcg: do not open code accesses to res_counter members
  drivers/rtc/rtc-efi.c: fix section mismatch warning
  drivers/rtc/rtc-r9701.c: reset registers if invalid values are detected
  drivers/char/random.c: fix boot id uniqueness race
  memcg: fix broken boolen expression
  memcg: fix up documentation on global LRU
2012-04-12 14:15:21 -07:00
Jason Wessel
026ee1f66a panic: fix stack dump print on direct call to panic()
Commit 6e6f0a1f0f ("panic: don't print redundant backtraces on oops")
causes a regression where no stack trace will be printed at all for the
case where kernel code calls panic() directly while not processing an
oops, and of course there are 100's of instances of this type of call.

The original commit executed the check (!oops_in_progress), but this will
always be false because just before the dump_stack() there is a call to
bust_spinlocks(1), which does the following:

  void __attribute__((weak)) bust_spinlocks(int yes)
  {
	if (yes) {
		++oops_in_progress;

The proper way to resolve the problem that original commit tried to
solve is to avoid printing a stack dump from panic() when the either of
the following conditions is true:

  1) TAINT_DIE has been set (this is done by oops_end())
     This indicates and oops has already been printed.
  2) oops_in_progress > 1
     This guards against the rare case where panic() is invoked
     a second time, or in between oops_begin() and oops_end()

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>	[3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-12 13:12:12 -07:00
Linus Torvalds
7e06648972 irqdomain bug fixes for v3.4-rc3
This branch fixes a bug in irq_create_mapping() where an error return
 from irq_alloc_desc_from() gets ignored.  It also removes irq_virq_count
 to fix a bug on powerpc where the irqdomain code does not find irqs
 allocated above the CONFIG_NR_IRQS boundary.  The remaining patches get
 rid of an completely pointless export and fix some minor bugs in the
 irqdomain debug output.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPhni4AAoJEEFnBt12D9kBA/cP/jv3ENYDy2/g1/eE6W1aSkUf
 /7FlfpXsufS0Bl+wfk7sN8D1NLoB/36bLVU0TStup90vL03WT9A+BHl9tjogpZVz
 oDuLFYHSuVVOK40SSrcnOUc6rncKAni9tGjVjFCxVAx3FlqebTHWDu/Cl4BAaWBo
 +j2u4HHelHgr8oXCY5avWS0cOn3L7rIoJ54/Jqpn10OooqH2cgz9xYMb+1/ORfz1
 xjpJ4OiXKnSvuG7WD0S1EKPMbaiyak+jBoHYYNpEOriTMtcOTNg5hjz7b3jDfOrm
 gkNReffdDXCnsCPj/1gEhJlB4i+iTES0lTBVfOZ8M2luhF6wuGUYeRaiy+/m00DZ
 qYFXD5TaVM0+2USCeo71DPfag8now6YrJNIv93CGEY0fLGDJJg2yJI3oUN728p9a
 E88JLPs8f//8rxQaBatGtHmReD4wKwCevciVekSWZSROnPxnIP8PvBPq8e4Bf04r
 q+VBmr+gJh+oaDAZrIaRPsRCidHhwzIrexa4cv7rt84vnx2Hltq75ijaPNlR3JU7
 FFhZj1l8185HxXEsTJHEmiKN0J/drVIu/beGgHD7NbWWIdt8tqgtNOEUudVTisfM
 VgBdgjjbKFwQDuOxgaYgERwCkb1YXFT/kDKpgKaYnxl0yGaALjxO+ISd2fIJOuKO
 fzeVN4LDvVCysAQ/SeOG
 =6Ejq
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull irqdomain bug fixes from Grant Likely:
 "This branch fixes a bug in irq_create_mapping() where an error return
  from irq_alloc_desc_from() gets ignored.

  It also removes irq_virq_count to fix a bug on powerpc where the
  irqdomain code does not find irqs allocated above the CONFIG_NR_IRQS
  boundary.

  The remaining patches get rid of an completely pointless export and
  fix some minor bugs in the irqdomain debug output."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  irq_domain: Move irq_virq_count into NOMAP revmap
  irqdomain: Fix debugfs formatting
  irq_domain: correct the debugfs file name
  irq: Kill pointless irqd_to_hw export
  irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from().
2012-04-12 12:49:56 -07:00
Grant Likely
6fa6c8e25e irq_domain: Move irq_virq_count into NOMAP revmap
This patch replaces the old global setting of irq_virq_count that is only
used by the NOMAP mapping and instead uses a revmap_data property so that
the maximum NOMAP allocation can be set per NOMAP irq_domain.

There is exactly one user of irq_virq_count in-tree right now: PS3.
Also, irq_virq_count is only useful for the NOMAP mapping.  So,
instead of having a single global irq_virq_count values, this change
drops it entirely and added a max_irq argument to irq_domain_add_nomap().
That makes it a property of an individual nomap irq domain instead of
a global system settting.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
2012-04-12 00:37:48 -06:00
Oleg Nesterov
79549c6dfd cred: copy_process() should clear child->replacement_session_keyring
keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().

However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).

change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.

Cc: <stable@vger.kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-11 08:20:11 -07:00
Grant Likely
15e06bf64f irqdomain: Fix debugfs formatting
This patch fixes the irq_domain_mapping debugfs output to pad pointer
values with leading zeros so that pointer values are displayed
correctly.  Otherwise you get output similar to "0x 5e0000000000000".
Also, when the irq_domain is set to 'null'

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2012-04-11 01:01:45 -06:00
Mika Westerberg
ac5830a33f irq_domain: correct the debugfs file name
The actual name of the irq_domain mapping debugfs file is
"irq_domain_mapping" not "virq_mapping".

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-04-10 22:39:17 -06:00
David Daney
5b7526e3a6 irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from().
In commit 4bbdd45a (irq_domain/powerpc: eliminate irq_map; use
irq_alloc_desc() instead) code was added that ignores error returns
from irq_alloc_desc_from() by (silently) casting the return value to
unsigned.  The negitive value error return now suddenly looks like a
valid irq number.

Commits cc79ca69 (irq_domain: Move irq_domain code from powerpc to
kernel/irq) and 1bc04f2c (irq_domain: Add support for base irq and
hwirq in legacy mappings) move this code to its current location in
irqdomain.c

The result of all of this is a null pointer dereference OOPS if one of
the error cases is hit.

The fix: Don't cast away the negativeness of the return value and then
check for errors.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
[grant.likely: dropped addition of new 'irq' variable]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-04-10 22:39:16 -06:00
Suresh Siddha
fa4da365bc clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()
In the commit 77b0d60c5a,
"clockevents: Leave the broadcast device in shutdown mode when not needed",
we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.

This breaks the platforms which need broadcast device oneshot services during
deep idle states. tick_broadcast_oneshot_control() thinks that it is
in periodic mode and fails to take proper decisions based on the
CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
idle entry/exit.

Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
before leaving the broadcast HW device in shutdown mode if there are no active
requests for the moment.

Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-10 11:42:07 +02:00