linux/include
Hidetoshi Seto 19eb722b76 sched, cputime: Introduce thread_group_times()
commit 0cf55e1ec0 upstream.

This is a real fix for problem of utime/stime values decreasing
described in the thread:

   http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

 - {u,s}time in task_struct are increased every time when the thread
   is interrupted by a tick (timer interrupt).

 - When a thread exits, its {u,s}time are added to signal->{u,s}time,
   after adjusted by task_times().

 - When all threads in a thread_group exits, accumulated {u,s}time
   (and also c{u,s}time) in signal struct are added to c{u,s}time
   in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

 - task_times(), to get cputime of a thread:
   This function returns adjusted values that originates from raw
   {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

 - thread_group_cputime(), to get cputime of a thread group:
   This function returns sum of all {u,s}time of living threads in
   the group, plus {u,s}time in the signal struct that is sum of
   adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

  group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

  group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

 - Modify thread's exit (= __exit_signal()) not to use task_times().
   It means {u,s}time in signal struct accumulates raw values instead
   of adjusted values.  As the result it makes thread_group_cputime()
   to return pure sum of "raw" values.

 - Introduce a new function thread_group_times(*task, *utime, *stime)
   that converts "raw" values of thread_group_cputime() to "adjusted"
   values, in same calculation procedure as task_times().

 - Modify group's exit (= wait_task_zombie()) to use this introduced
   thread_group_times().  It make c{u,s}time in signal struct to
   have adjusted values like before this patch.

 - Replace some thread_group_cputime() by thread_group_times().
   This replacements are only applied where conveys the "adjusted"
   cputime to users, and where already uses task_times() near by it.
   (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)

This patch have a positive side effect:

 - Before this patch, if a group contains many short-life threads
   (e.g. runs 0.9ms and not interrupted by ticks), the group's
   cputime could be invisible since thread's cputime was accumulated
   after adjusted: imagine adjustment function as adj(ticks, runtime),
     {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
   After this patch it will not happen because the adjustment is
   applied after accumulated.

v2:
 - remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13 13:20:14 -07:00
..
acpi ACPI: skip checking BM_STS if the BIOS doesn't ask for it 2010-08-02 10:21:24 -07:00
asm-generic dma-mapping: fix dma_sync_single_range_* 2010-05-26 14:29:14 -07:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2009-09-11 09:38:37 -07:00
drm drm/radeon/kms: add FireMV 2400 PCI ID. 2010-04-26 07:41:26 -07:00
keys RxRPC: Use uX/sX rather than uintX_t/intX_t types 2009-09-16 00:01:13 -07:00
linux sched, cputime: Introduce thread_group_times() 2010-08-13 13:20:14 -07:00
math-emu math-emu: correct test for downshifting fraction in _FP_FROM_INT() 2010-08-02 10:20:44 -07:00
media V4L/DVB (13019): video: initial support for ADV7180 2009-09-19 00:53:39 -03:00
mtd
net sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) 2010-07-05 11:11:14 -07:00
pcmcia PM / yenta: Split resume into early and late parts (rev. 4) 2009-11-03 10:54:58 +01:00
rdma trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
rxrpc
scsi SCSI: fc-transport: Use packed modifier for fc_bsg_request structure. 2010-04-26 07:41:30 -07:00
sound Merge branch 'topic/ymfpci' into for-linus 2009-09-10 15:33:09 +02:00
trace tracing: Fix ftrace_event_call alignment for use with gcc 4.5 2010-05-12 14:57:14 -07:00
video davinci-fb-frame-buffer-driver-for-ti-da8xx-omap-l1xx-v4 2009-09-23 07:39:51 -07:00
xen
Kbuild