linux/kernel/sched
Patrick Bellasi 19718921c3 UPSTREAM: sched/uclamp: Extend CPU's cgroup controller
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      minimum frequency which corresponds to the uclamp.min
	      utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      maximum frequency which corresponds to the uclamp.max
	      utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2480c09313)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0285c44910bf073b80d7996361e6698bc5aedfae
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
..
autogroup.c ANDROID: sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG 2018-10-26 12:44:03 +01:00
autogroup.h
clock.c sched/clock: Disable interrupts when calling generic_sched_clock_init() 2018-07-30 19:33:35 +02:00
completion.c sched/Documentation: Update wake_up() & co. memory-barrier guarantees 2018-07-17 09:30:34 +02:00
core.c UPSTREAM: sched/uclamp: Extend CPU's cgroup controller 2020-02-01 15:03:14 +00:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq_schedutil.c BACKPORT: sched/uclamp: Add uclamp support to energy_compute() 2020-02-01 15:03:14 +00:00
cpufreq.c cpufreq: Avoid leaving stale IRQ work items during CPU offline 2019-12-31 16:36:22 +01:00
cpupri.c
cpupri.h
cputime.c This is the 4.19.82 stable release 2019-11-06 13:21:58 +01:00
deadline.c This is the 4.19.77 stable release 2019-10-06 11:27:45 +02:00
debug.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
fair.c BACKPORT: sched/uclamp: Add uclamp support to energy_compute() 2020-02-01 15:03:14 +00:00
features.h ANDROID: sched: Disable find_best_target() by default 2019-03-27 15:58:02 +00:00
idle.c This is the 4.19.77 stable release 2019-10-06 11:27:45 +02:00
isolation.c
loadavg.c UPSTREAM: sched: loadavg: make calc_load_n() public 2019-03-21 16:25:27 -07:00
Makefile UPSTREAM: psi: pressure stall information for CPU, memory, and IO 2019-03-21 16:25:27 -07:00
membarrier.c sched/membarrier: Fix private expedited registration check 2019-10-11 18:21:22 +02:00
pelt.c UPSTREAM: sched/fair: Update scale invariance of PELT 2019-03-26 14:22:50 +00:00
pelt.h UPSTREAM: sched/fair: Update scale invariance of PELT 2019-03-26 14:22:50 +00:00
psi.c UPSTREAM: psi: get poll_work to run when calling poll syscall next time 2019-09-19 23:59:24 +00:00
rt.c BACKPORT: sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks 2020-02-01 14:39:38 +00:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-07-26 09:14:08 +02:00
sched.h UPSTREAM: sched/uclamp: Extend CPU's cgroup controller 2020-02-01 15:03:14 +00:00
stats.c
stats.h UPSTREAM: psi: make disabling/enabling easier for vendor kernels 2019-03-21 16:25:27 -07:00
stop_task.c FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide 2018-10-26 12:15:52 +01:00
swait.c sched/swait: Rename to exclusive 2018-06-20 11:35:56 +02:00
topology.c This is the 4.19.87 stable release 2019-12-01 09:53:43 +01:00
tune.c ANDROID: Add hold functionality to schedtune CPU boost 2018-10-26 12:44:06 +01:00
tune.h ANDROID: sched/rt: Add schedtune accounting to rt task enqueue/dequeue 2018-10-26 12:44:06 +01:00
wait_bit.c
wait.c sched/wait: assert the wait_queue_head lock is held in __wake_up_common 2018-08-22 10:52:47 -07:00