linux/kernel
Anton Blanchard 277899ef0a sched: cpuacct: Use bigger percpu counter batch values for stats counters
commit fa535a77bd upstream

When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch.  This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations.  We cap it at INT_MAX in case of
overflow.

For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC.  So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:

 CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
 CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
 CONFIG_CGROUP_CPUACCT + patch:	   16663217 ctx switches/sec

Tested with:

 wget http://ozlabs.org/~anton/junkcode/context_switch.c
 make context_switch
 for i in `seq 0 63`; do taskset -c $i ./context_switch & done
 vmstat 1

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20 13:18:12 -07:00
..
gcov gcov: fix null-pointer dereference for certain module types 2010-09-20 13:17:53 -07:00
irq irq: Add new IRQ flag IRQF_NO_SUSPEND 2010-08-13 13:19:50 -07:00
power Freezer: Fix buggy resume test for tasks frozen with cgroup freezer 2010-04-26 07:41:17 -07:00
time timekeeping: Fix clock_gettime vsyscall time warp 2010-08-13 13:20:13 -07:00
trace tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread 2010-09-20 13:17:52 -07:00
.gitignore
acct.c bsdacct: fix uid/gid misreporting 2009-12-18 14:03:52 -08:00
async.c
audit_tree.c fix more leaks in audit_tree.c tag_chunk() 2010-01-18 10:19:50 -08:00
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup_freezer.c Freezer: Fix buggy resume test for tasks frozen with cgroup freezer 2010-04-26 07:41:17 -07:00
cgroup.c cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput() 2010-01-18 10:19:32 -08:00
compat.c compat: Make compat_alloc_user_space() incorporate the access_ok() 2010-09-20 13:17:57 -07:00
configs.c
cpu.c sched: _cpu_down(): Don't play with current->cpus_allowed 2010-09-20 13:18:08 -07:00
cpuset.c sched: Make select_fallback_rq() cpuset friendly 2010-09-20 13:18:08 -07:00
cred-internals.h
cred.c CRED: Fix a race in creds_are_invalid() in credentials debugging 2010-05-12 14:57:10 -07:00
delayacct.c
dma.c
exec_domain.c
exit.c sched, cputime: Introduce thread_group_times() 2010-08-13 13:20:14 -07:00
extable.c
fork.c sched: Fix fork vs hotplug vs cpuset namespaces 2010-09-20 13:18:02 -07:00
freezer.c
futex_compat.c
futex.c futex: futex_find_get_task remove credentails check 2010-08-02 10:21:24 -07:00
groups.c kernel/groups.c: fix integer overflow in groups_search 2010-09-20 13:17:54 -07:00
hrtimer.c hrtimer: Tune hrtimer_interrupt hang logic 2010-04-01 15:58:14 -07:00
hung_task.c
itimer.c
kallsyms.c
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c
kfifo.c
kgdb.c
kmod.c
kprobes.c
ksysfs.c
kthread.c cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node 2010-04-01 15:58:46 -07:00
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c Revert "lockdep: fix incorrect percpu usage" 2010-06-01 09:45:46 -07:00
Makefile SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
module.c dynamic debug: move ddebug_remove_module() down into free_module() 2010-08-02 10:20:47 -07:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: Fix optimistic spinning vs. BKL 2010-07-05 11:10:31 -07:00
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c
params.c
perf_event.c Fix racy use of anon_inode_getfd() in perf_event.c 2010-07-05 11:10:30 -07:00
pid_namespace.c
pid.c
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c posix_timer: Fix error path in timer_create 2010-07-05 11:10:30 -07:00
printk.c
profile.c profile: fix stats and data leakage 2010-05-26 14:29:18 -07:00
ptrace.c
rcupdate.c
rcutorture.c
rcutree_plugin.h rcu: Remove inline from forward-referenced functions 2009-12-18 14:03:04 -08:00
rcutree_trace.c
rcutree.c rcu: Fix note_new_gpnum() uses of ->gpnum 2009-12-18 14:03:01 -08:00
rcutree.h rcu: Remove inline from forward-referenced functions 2009-12-18 14:03:04 -08:00
relay.c
res_counter.c
resource.c
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2010-01-22 15:18:30 -08:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Remove forced2_migrations stats 2010-09-20 13:17:59 -07:00
sched_fair.c sched: Fix select_idle_sibling() logic in select_task_rq_fair() 2010-09-20 13:18:12 -07:00
sched_features.h
sched_idletask.c sched: Fix TASK_WAKING vs fork deadlock 2010-09-20 13:18:09 -07:00
sched_rt.c sched: Fix TASK_WAKING vs fork deadlock 2010-09-20 13:18:09 -07:00
sched_stats.h
sched.c sched: cpuacct: Use bigger percpu counter batch values for stats counters 2010-09-20 13:18:12 -07:00
seccomp.c
semaphore.c
signal.c signals: check_kill_permission(): don't check creds if same_thread_group() 2010-07-05 11:10:56 -07:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c slow-work: use get_ref wrapper instead of directly calling get_ref 2010-08-10 10:20:45 -07:00
slow-work.h SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
smp.c
softirq.c
softlockup.c softlockup: Stop spurious softlockup messages due to overflow 2010-04-01 15:58:47 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c sched, cputime: Introduce thread_group_times() 2010-08-13 13:20:14 -07:00
sysctl_check.c NET: fix oops at bootime in sysctl code 2010-02-09 04:51:02 -08:00
sysctl.c kernel/sysctl.c: fix stable merge error in NOMMU mmap_min_addr 2010-01-18 10:19:49 -08:00
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c
tracepoint.c
tsacct.c
uid16.c
up.c
user_namespace.c
user.c uids: Prevent tear down race 2009-11-02 16:02:39 +01:00
utsname_sysctl.c
utsname.c
wait.c
workqueue.c workqueue: fix race condition in schedule_on_each_cpu() 2009-11-17 17:40:33 -08:00