linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-13 01:08:08 +02:00

History

Morten Rasmussen 2802bf3cd9 sched/fair: Add over-utilization/tipping point indicator Energy-aware scheduling is only meant to be active while the system is _not_ over-utilized. That is, there are spare cycles available to shift tasks around based on their actual utilization to get a more energy-efficient task distribution without depriving any tasks. When above the tipping point task placement is done the traditional way based on load_avg, spreading the tasks across as many cpus as possible based on priority scaled load to preserve smp_nice. Below the tipping point we want to use util_avg instead. We need to define a criteria for when we make the switch. The util_avg for each cpu converges towards 100% regardless of how many additional tasks we may put on it. If we define over-utilized as: sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity) some individual cpus may be over-utilized running multiple tasks even when the above condition is false. That should be okay as long as we try to spread the tasks out to avoid per-cpu over-utilization as much as possible and if all tasks have the _same_ priority. If the latter isn't true, we have to consider priority to preserve smp_nice. For example, we could have n_cpus nice=-10 util_avg=55% tasks and n_cpus/2 nice=0 util_avg=60% tasks. Balancing based on util_avg we are likely to end up with nice=-10 tasks sharing cpus and nice=0 tasks getting their own as we 1.5*n_cpus tasks in total and 55%+55% is less over-utilized than 55%+60% for those cpus that have to be shared. The system utilization is only 85% of the system capacity, but we are breaking smp_nice. To be sure not to break smp_nice, we have defined over-utilization conservatively as when any cpu in the system is fully utilized at its highest frequency instead: cpu_rq(any).cfs.avg.util_avg + margin > cpu_rq(any).capacity IOW, as soon as one cpu is (nearly) 100% utilized, we switch to load_avg to factor in priority to preserve smp_nice. With this definition, we can skip periodic load-balance as no cpu has an always-running task when the system is not over-utilized. All tasks will be periodic and we can balance them at wake-up. This conservative condition does however mean that some scenarios that could benefit from energy-aware decisions even if one cpu is fully utilized would not get those benefits. For systems where some cpus might have reduced capacity on some cpus (RT-pressure and/or big.LITTLE), we want periodic load-balance checks as soon a just a single cpu is fully utilized as it might one of those with reduced capacity and in that case we want to migrate it. [ peterz: Added a comment explaining why new tasks are not accounted during overutilization detection. ] Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: rjw@rjwysocki.net Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-13-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-12-11 15:17:01 +01:00
..
bpf	bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr	2018-11-26 17:34:24 -08:00
cgroup	for-linus-20181102	2018-11-02 11:25:48 -07:00
configs	kvm_config: add CONFIG_VIRTIO_MENU	2018-10-24 20:55:56 -04:00
debug	kdb: kdb_support: mark expected switch fall-throughs	2018-11-13 20:38:50 +00:00
dma	swiotlb: Skip cache maintenance on map error	2018-11-21 18:47:58 +01:00
events	uprobes: Fix handle_swbp() vs. unregister() + register() race once more	2018-11-23 08:31:19 +01:00
gcov	gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT	2018-06-08 18:56:02 +09:00
irq	irq/matrix: Fix memory overallocation	2018-11-01 10:00:38 +01:00
livepatch	Merge branch 'for-4.19/upstream' into for-linus	2018-08-20 18:33:50 +02:00
locking	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
power	PM: Introduce an Energy Model management framework	2018-12-11 15:16:58 +01:00
printk	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
rcu	Merge branches 'doc.2018.08.30a', 'dynticks.2018.08.30b', 'srcu.2018.08.30b' and 'torture.2018.08.29a' into HEAD	2018-08-30 16:12:53 -07:00
sched	sched/fair: Add over-utilization/tipping point indicator	2018-12-11 15:17:01 +01:00
time	posix-cpu-timers: Remove useless call to check_dl_overrun()	2018-11-08 07:43:35 +01:00
trace	This includes two more fixes:	2018-11-30 10:40:11 -08:00
.gitignore
acct.c	kernel/acct.c: fix the acct->needcheck check in check_free_space()	2018-01-04 16:45:09 -08:00
async.c	kernel/async.c: revert "async: simplify lowest_in_progress()"	2018-02-06 18:32:44 -08:00
audit_fsnotify.c	fsnotify: add fsnotify_add_inode_mark() wrappers	2018-05-18 14:58:22 +02:00
audit_tree.c	\n	2018-08-17 09:41:28 -07:00
audit_watch.c	audit: fix use-after-free in audit_add_watch	2018-07-18 11:43:36 -04:00
audit.c	audit: use ktime_get_coarse_real_ts64() for timestamps	2018-07-17 14:45:08 -04:00
audit.h	audit: track the owner of the command mutex ourselves	2018-02-23 11:22:22 -05:00
auditfilter.c	audit: rename FILTER_TYPE to FILTER_EXCLUDE	2018-06-19 10:39:54 -04:00
auditsc.c	audit/stable-4.18 PR 20180814	2018-08-15 10:46:54 -07:00
backtracetest.c
bounds.c	kbuild: fix kernel/bounds.c 'W=1' warning	2018-10-31 08:54:14 -07:00
capability.c
compat.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
configs.c
context_tracking.c
cpu_pm.c
cpu.c	x86/speculation: Rework SMT state change	2018-11-28 11:57:07 +01:00
crash_core.c	kernel/crash_core.c: print timestamp using time64_t	2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c
delayacct.c	delayacct: track delays from thrashing cache pages	2018-10-26 16:26:32 -07:00
dma.c	proc: introduce proc_create_single{,_data}	2018-05-16 07:23:35 +02:00
elfcore.c
exec_domain.c	proc: introduce proc_create_single{,_data}	2018-05-16 07:23:35 +02:00
exit.c	signal: Pass pid type into group_send_sig_info	2018-07-21 12:57:35 -05:00
extable.c	extable: Make init_kernel_text() global	2018-02-21 16:54:06 +01:00
fail_function.c	kernel/fail_function.c: remove meaningless null pointer check before debugfs_remove_recursive	2018-10-31 08:54:12 -07:00
fork.c	New gcc plugin: stackleak	2018-11-01 11:46:27 -07:00
freezer.c	PM / reboot: Eliminate race between reboot and suspend	2018-08-06 12:35:20 +02:00
futex_compat.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
futex.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
groups.c
hung_task.c	kernel: hung_task.c: disable on suspend	2018-10-25 18:45:08 +02:00
iomem.c	memremap: split devm_memremap_pages() and memremap() infrastructure	2018-05-15 23:08:33 -07:00
irq_work.c	irq/work: Improve the flag definitions	2018-01-08 19:43:15 +01:00
jump_label.c	Merge branch 'x86/build' into locking/core, to pick up dependent patches and unify jump-label work	2018-10-16 17:30:11 +02:00
kallsyms.c	kallsyms: reduce size a little on 64-bit	2018-09-10 22:54:33 +09:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt	kconfig: include kernel/Kconfig.preempt from init/Kconfig	2018-08-02 08:06:54 +09:00
kcov.c	kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace	2018-11-30 14:56:14 -08:00
kexec_core.c	kexec: Allocate decrypted control pages for kdump if SME is enabled	2018-10-06 12:01:51 +02:00
kexec_file.c	kernel/kexec_file.c: remove some duplicated includes	2018-11-03 10:09:37 -07:00
kexec_internal.h
kexec.c	kexec: add call to LSM hook in original kexec_load syscall	2018-07-16 12:31:57 -07:00
kmod.c
kprobes.c	kprobes: Don't call BUG_ON() if there is a kprobe in use on free list	2018-09-12 08:01:16 +02:00
ksysfs.c
kthread.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-08-13 11:25:07 -07:00
latencytop.c
Makefile	x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls	2018-09-04 10:35:47 -07:00
memremap.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
module_signing.c	modsign: log module name in the event of an error	2018-07-02 11:36:17 +02:00
module-internal.h	modsign: log module name in the event of an error	2018-07-02 11:36:17 +02:00
module.c	jump_table: Move entries into ro_after_init region	2018-09-27 17:56:49 +02:00
notifier.c
nsproxy.c
padata.c	padata: add SPDX identifier	2018-01-05 18:43:00 +11:00
panic.c	kernel/panic.c: filter out a potential trailing newline	2018-10-31 08:54:14 -07:00
params.c	kernel/params.c: downgrade warning for unsafe parameters	2018-04-11 10:28:37 -07:00
pid_namespace.c	signal: Use group_send_sig_info to kill all processes in a pid namespace	2018-09-16 16:08:25 +02:00
pid.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
profile.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
ptrace.c	ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS	2018-11-28 11:57:11 +01:00
range.c
reboot.c	kernel/reboot.c: export pm_power_off_prepare	2018-09-11 16:13:24 +01:00
relay.c	kernel/relay.c: change return type to vm_fault_t	2018-06-15 07:55:24 +09:00
resource.c	resource/docs: Complete kernel-doc style function documentation	2018-11-07 16:47:47 +01:00
rseq.c	rseq: uapi: Declare rseq_cs field as union, update includes	2018-07-10 22:18:52 +02:00
seccomp.c	Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2018-10-24 11:49:35 +01:00
signal.c	kernel/signal.c: fix a comment error	2018-10-31 08:54:14 -07:00
smp.c	smp,cpumask: introduce on_each_cpu_cond_mask	2018-10-09 16:51:11 +02:00
smpboot.c	smpboot: Remove cpumask from the API	2018-07-03 09:20:44 +02:00
smpboot.h
softirq.c	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-10-25 11:43:47 -07:00
stackleak.c	stackleak: Disable function tracing and kprobes for stackleak_erase()	2018-11-30 09:05:07 -08:00
stacktrace.c
stop_machine.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-08-13 11:25:07 -07:00
sys_ni.c	Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-06-10 10:17:09 -07:00
sys.c	kernel/sys.c: remove duplicated include	2018-09-20 22:01:11 +02:00
sysctl_binary.c	staging: irda: remove remaining remants of irda code removal	2018-04-16 11:26:49 +02:00
sysctl.c	kernel/sysctl.c: remove duplicated include	2018-11-03 10:09:37 -07:00
task_work.c
taskstats.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
test_kprobes.c	kprobes: Remove jprobe API implementation	2018-06-21 12:33:05 +02:00
torture.c	rcutorture: Check GP completion at stutter end	2018-08-29 09:20:48 -07:00
tracepoint.c	tracepoint: Fix tracepoint array element size mismatch	2018-10-17 15:35:29 -04:00
tsacct.c
ucount.c	headers: untangle kmemleak.h from mm.h	2018-04-05 21:36:27 -07:00
uid16.c	fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers	2018-04-02 20:15:59 +02:00
uid16.h	kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c	2018-04-02 20:15:30 +02:00
umh.c	umh: Add command line to user mode helpers	2018-10-22 19:37:36 -07:00
up.c	smp,cpumask: introduce on_each_cpu_cond_mask	2018-10-09 16:51:11 +02:00
user_namespace.c	userns: also map extents in the reverse map to kernel IDs	2018-11-07 23:51:16 -06:00
user-return-notifier.c
user.c	userns: use irqsave variant of refcount_dec_and_lock()	2018-08-22 10:52:47 -07:00
utsname_sysctl.c	sys: don't hold uts_sem while accessing userspace memory	2018-08-11 02:05:53 -05:00
utsname.c	uts: create "struct uts_namespace" from kmem_cache	2018-04-11 10:28:35 -07:00
watchdog_hld.c	watchdog: Mark watchdog touch functions as notrace	2018-08-30 12:56:40 +02:00
watchdog.c	watchdog: Mark watchdog touch functions as notrace	2018-08-30 12:56:40 +02:00
workqueue_internal.h	workqueue: Set worker->desc to workqueue name by default	2018-05-18 08:47:13 -07:00
workqueue.c	watchdog: Mark watchdog touch functions as notrace	2018-08-30 12:56:40 +02:00