linux/kernel
Patrick Bellasi 19718921c3 UPSTREAM: sched/uclamp: Extend CPU's cgroup controller
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      minimum frequency which corresponds to the uclamp.min
	      utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      maximum frequency which corresponds to the uclamp.max
	      utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2480c09313)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0285c44910bf073b80d7996361e6698bc5aedfae
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
..
bpf This is the 4.19.99 stable release 2020-01-27 15:55:44 +01:00
cgroup UPSTREAM: cgroup: add cgroup_parse_float() 2020-02-01 14:39:37 +00:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug This is the 4.19.99 stable release 2020-01-27 15:55:44 +01:00
dma dma-debug: add a schedule point in debug_dma_dump_mappings() 2020-01-04 19:12:43 +01:00
events This is the 4.19.99 stable release 2020-01-27 15:55:44 +01:00
gcov UPSTREAM: gcov: clang support 2019-05-17 16:05:09 -07:00
irq irqdomain: Add the missing assignment of domain->fwnode for named fwnode 2020-01-27 14:51:09 +01:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:57:10 +02:00
locking locking/spinlock/debug: Fix various data races 2020-01-12 12:17:05 +01:00
power This is the 4.19.94 stable release 2020-01-09 16:14:43 +01:00
printk printk: fix integer overflow in setup_log_buf() 2019-12-01 09:16:14 +01:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:46:30 -07:00
sched UPSTREAM: sched/uclamp: Extend CPU's cgroup controller 2020-02-01 15:03:14 +00:00
time This is the 4.19.98 stable release 2020-01-23 08:36:16 +01:00
trace This is the 4.19.100 stable release 2020-01-29 17:10:45 +01:00
.gitignore BACKPORT: Provide in-kernel headers to make extending kernel easier 2019-06-12 12:33:20 +00:00
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c
audit_fsnotify.c
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-12-13 08:51:02 +01:00
audit.c audit: use ktime_get_coarse_real_ts64() for timestamps 2018-07-17 14:45:08 -04:00
audit.h
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:46:17 -07:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
cfi.c ANDROID: add support for clang Control Flow Integrity (CFI) 2019-05-20 17:46:32 -07:00
compat.c BACKPORT: make 'user_access_begin()' do 'access_ok()' 2019-09-12 11:28:03 +00:00
configs.c
context_tracking.c
cpu_pm.c
cpu.c This is the 4.19.86 stable release 2019-11-25 10:00:06 +01:00
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:19:00 +01:00
delayacct.c UPSTREAM: delayacct: track delays from thrashing cache pages 2019-03-21 16:25:26 -07:00
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:21:23 +02:00
exec_domain.c
exit.c exit: panic before exit_mm() on global init exit 2020-01-09 10:19:02 +01:00
extable.c
fail_function.c bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe 2018-06-21 12:33:19 +02:00
fork.c This is the 4.19.99 stable release 2020-01-27 15:55:44 +01:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c futex: Prevent robust futex exit race 2019-12-01 09:17:38 +01:00
gen_kheaders.sh BACKPORT: kheaders: Do not regenerate archive if config is not changed 2019-06-12 12:35:31 +00:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kallsyms.c ANDROID: kallsyms: strip hashes from function names with ThinLTO 2020-01-31 16:50:06 +00:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
kcov.c UPSTREAM: kcov: remote coverage support 2020-01-15 14:51:23 +00:00
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:20:29 +01:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h
kexec.c kexec: add call to LSM hook in original kexec_load syscall 2018-07-16 12:31:57 -07:00
kheaders.c BACKPORT: kheaders: Move from proc to sysfs 2019-06-12 12:33:54 +00:00
kmod.c
kprobes.c kprobes: Blacklist symbols in arch-defined prohibited area 2019-12-05 09:20:26 +01:00
ksysfs.c
kthread.c FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h 2019-03-22 23:07:04 +00:00
latencytop.c
Makefile This is the 4.19.87 stable release 2019-12-01 09:53:43 +01:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
module_signing.c modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module-internal.h modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module.c This is the 4.19.90 stable release 2019-12-18 09:03:30 +01:00
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-07-26 09:14:25 +02:00
panic.c kernel/panic.c: do not append newline to the stack protector panic string 2019-12-01 09:17:10 +01:00
params.c
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-26 09:14:01 +02:00
pid.c UPSTREAM: pid: add pidfd_open() 2019-08-12 13:36:37 -04:00
profile.c
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-23 08:21:29 +01:00
range.c
reboot.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
scs.c FROMLIST: scs: add support for stack usage debugging 2019-11-27 12:37:25 -08:00
seccomp.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
signal.c This is the 4.19.99 stable release 2020-01-27 15:55:44 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c smpboot: Remove cpumask from the API 2018-07-03 09:20:44 +02:00
smpboot.h
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys_ni.c UPSTREAM: signal: support CLONE_PIDFD with pidfd_send_signal 2019-08-12 13:36:37 -04:00
sys.c UPSTREAM: arm64: Tighten the PR_{SET, GET}_TAGGED_ADDR_CTRL prctl() unused arguments 2019-10-07 15:27:39 -04:00
sysctl_binary.c
sysctl.c UPSTREAM: sched/uclamp: Add system default clamps 2020-02-01 14:39:37 +00:00
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:18:59 +01:00
test_kprobes.c kprobes: Remove jprobe API implementation 2018-06-21 12:33:05 +02:00
torture.c torture: Keep old-school dmesg format 2018-06-25 11:30:10 -07:00
tracepoint.c tracepoint: Fix tracepoint array element size mismatch 2018-10-17 15:35:29 -04:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
user-return-notifier.c
user.c ANDROID: proc: Add /proc/uid directory 2019-03-06 15:59:21 +00:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
utsname.c
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
watchdog.c watchdog: Respect watchdog cpumask on CPU hotplug 2019-04-03 06:26:29 +02:00
workqueue_internal.h UPSTREAM: psi: fix aggregation idle shut-off 2019-03-21 16:25:27 -07:00
workqueue.c This is the 4.19.90 stable release 2019-12-18 09:03:30 +01:00