linux/kernel/sched
Peter Zijlstra 630948e7ac BACKPORT: sched/fair: Fix PELT integrity for new tasks
Vincent and Yuyang found another few scenarios in which entity
tracking goes wobbly.

The scenarios are basically due to the fact that new tasks are not
immediately attached and thereby differ from the normal situation -- a
task is always attached to a cfs_rq load average (such that it
includes its blocked contribution) and are explicitly
detached/attached on migration to another cfs_rq.

Scenario 1: switch to fair class

  p->sched_class = fair_class;
  if (queued)
    enqueue_task(p);
      ...
        enqueue_entity()
	  enqueue_entity_load_avg()
	    migrated = !sa->last_update_time (true)
	    if (migrated)
	      attach_entity_load_avg()
  check_class_changed()
    switched_from() (!fair)
    switched_to()   (fair)
      switched_to_fair()
        attach_entity_load_avg()

If @p is a new task that hasn't been fair before, it will have
!last_update_time and, per the above, end up in
attach_entity_load_avg() _twice_.

Scenario 2: change between cgroups

  sched_move_group(p)
    if (queued)
      dequeue_task()
    task_move_group_fair()
      detach_task_cfs_rq()
        detach_entity_load_avg()
      set_task_rq()
      attach_task_cfs_rq()
        attach_entity_load_avg()
    if (queued)
      enqueue_task();
        ...
          enqueue_entity()
	    enqueue_entity_load_avg()
	      migrated = !sa->last_update_time (true)
	      if (migrated)
	        attach_entity_load_avg()

Similar as with scenario 1, if @p is a new task, it will have
!load_update_time and we'll end up in attach_entity_load_avg()
_twice_.

Furthermore, notice how we do a detach_entity_load_avg() on something
that wasn't attached to begin with.

As stated above; the problem is that the new task isn't yet attached
to the load tracking and thereby violates the invariant assumption.

This patch remedies this by ensuring a new task is indeed properly
attached to the load tracking on creation, through
post_init_entity_util_avg().

Of course, this isn't entirely as straightforward as one might think,
since the task is hashed before we call wake_up_new_task() and thus
can be poked at. We avoid this by adding TASK_NEW and teaching
cpu_cgroup_can_attach() to refuse such tasks.

.:: BACKPORT

Complicated by the fact that mch of the lines changed by the original
of this commit were then changed by:

df217913e7 sched/fair: Factorize attach/detach entity <Vincent Guittot>

and then

d31b1a66cb sched/fair: Factorize PELT update <Vincent Guittot>

, which have both already been backported here.

Reported-by: Yuyang Du <yuyang.du@intel.com>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dc603c902)
Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-11-20 21:15:59 +05:30
..
auto_group.c sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task() 2017-10-27 10:23:17 +02:00
auto_group.h
clock.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
completion.c
core.c BACKPORT: sched/fair: Fix PELT integrity for new tasks 2017-11-20 21:15:59 +05:30
cpuacct.c
cpuacct.h
cpudeadline.c sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpudeadline.h sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpufreq_sched.c cpufreq/sched: Use cpu max freq rather than policy max 2017-11-20 21:15:59 +05:30
cpufreq_schedutil.c cpufreq: schedutil: clamp util to CPU maximum capacity 2017-11-20 21:15:59 +05:30
cpufreq.c sched: backport cpufreq hooks from 4.9-rc4 2017-06-21 16:34:04 +05:30
cpupri.c
cpupri.h
cputime.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-09-20 15:18:54 +08:00
deadline.c sched: Update task->on_rq when tasks are moving between runqueues 2017-11-20 21:15:59 +05:30
debug.c sched/fair: Add eas (& cas) specific rq, sd and task stats 2017-06-21 16:37:38 +05:30
energy.c sched: Support for extracting EAS energy costs from DT 2016-09-14 14:48:50 +05:30
fair.c BACKPORT: sched/fair: Fix PELT integrity for new tasks 2017-11-20 21:15:59 +05:30
features.h sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag 2016-10-12 17:34:22 +05:30
idle_task.c sched: Make sched_class::set_cpus_allowed() unconditional 2015-08-12 12:06:09 +02:00
idle.c vmstat: make vmstat_updater deferrable again and shut down on idle 2016-09-14 15:02:22 +05:30
loadavg.c sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting 2017-07-05 14:37:21 +02:00
Makefile BACKPORT: cpufreq: schedutil: New governor based on scheduler utilization data 2017-06-21 16:34:04 +05:30
rt.c sched: Update task->on_rq when tasks are moving between runqueues 2017-11-20 21:15:59 +05:30
sched.h BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling 2017-11-20 21:15:59 +05:30
stats.c schedstats/eas: guard properly to avoid breaking non-smp schedstats users 2017-06-21 16:37:49 +05:30
stats.h sched/stat: Simplify the sched_info accounting dependency 2015-07-04 10:04:30 +02:00
stop_task.c sched: Introduce Window Assisted Load Tracking (WALT) 2016-09-14 15:02:22 +05:30
tune.c Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups" 2017-10-15 23:21:09 +05:30
tune.h sched/tune: Introducing a new schedtune attribute prefer_idle 2016-09-14 15:02:22 +05:30
wait.c sched/wait: Fix the signal handling fix 2015-12-13 14:30:59 -08:00
walt.c sched: walt: Leverage existing helper APIs to apply invariance 2017-11-20 21:15:59 +05:30
walt.h sched/fair: streamline find_best_target heuristics 2017-08-11 19:31:04 +05:30