linux/kernel
Rohit Jain 247f2f6f3c sched/core: Don't schedule threads on pre-empted vCPUs
In paravirt configurations today, spinlocks figure out whether a vCPU is
running to determine whether or not spinlock should bother spinning. We
can use the same logic to prioritize CPUs when scheduling threads. If a
vCPU has been pre-empted, it will incur the extra cost of VMENTER and
the time it actually spends to be running on the host CPU. If we had
other vCPUs which were actually running on the host CPU and idle we
should schedule threads there.

Performance numbers:

Note: With patch is referred to as Paravirt in the following and without
patch is referred to as Base.

1) When only 1 VM is running:

    a) Hackbench test on KVM 8 vCPUs, 10,000 loops (lower is better):

	+-------+-----------------+----------------+
	|Number |Paravirt         |Base            |
	|of     +---------+-------+-------+--------+
	|Threads|Average  |Std Dev|Average| Std Dev|
	+-------+---------+-------+-------+--------+
	|1      |1.817    |0.076  |1.721  | 0.067  |
	|2      |3.467    |0.120  |3.468  | 0.074  |
	|4      |6.266    |0.035  |6.314  | 0.068  |
	|8      |11.437   |0.105  |11.418 | 0.132  |
	|16     |21.862   |0.167  |22.161 | 0.129  |
	|25     |33.341   |0.326  |33.692 | 0.147  |
	+-------+---------+-------+-------+--------+

2) When two VMs are running with same CPU affinities:

    a) tbench test on VM 8 cpus

    Base:

	VM1:

	Throughput 220.59 MB/sec   1 clients  1 procs  max_latency=12.872 ms
	Throughput 448.716 MB/sec  2 clients  2 procs  max_latency=7.555 ms
	Throughput 861.009 MB/sec  4 clients  4 procs  max_latency=49.501 ms
	Throughput 1261.81 MB/sec  7 clients  7 procs  max_latency=76.990 ms

	VM2:

	Throughput 219.937 MB/sec  1 clients  1 procs  max_latency=12.517 ms
	Throughput 470.99 MB/sec   2 clients  2 procs  max_latency=12.419 ms
	Throughput 841.299 MB/sec  4 clients  4 procs  max_latency=37.043 ms
	Throughput 1240.78 MB/sec  7 clients  7 procs  max_latency=77.489 ms

    Paravirt:

	VM1:

	Throughput 222.572 MB/sec  1 clients  1 procs  max_latency=7.057 ms
	Throughput 485.993 MB/sec  2 clients  2 procs  max_latency=26.049 ms
	Throughput 947.095 MB/sec  4 clients  4 procs  max_latency=45.338 ms
	Throughput 1364.26 MB/sec  7 clients  7 procs  max_latency=145.124 ms

	VM2:

	Throughput 224.128 MB/sec  1 clients  1 procs  max_latency=4.564 ms
	Throughput 501.878 MB/sec  2 clients  2 procs  max_latency=11.061 ms
	Throughput 965.455 MB/sec  4 clients  4 procs  max_latency=45.370 ms
	Throughput 1359.08 MB/sec  7 clients  7 procs  max_latency=168.053 ms

    b) Hackbench with 4 fd 1,000,000 loops

	+-------+--------------------------------------+----------------------------------------+
	|Number |Paravirt                              |Base                                    |
	|of     +----------+--------+---------+--------+----------+--------+---------+----------+
	|Threads|Average1  |Std Dev1|Average2 | Std Dev|Average1  |Std Dev1|Average2 | Std Dev 2|
	+-------+----------+--------+---------+--------+----------+--------+---------+----------+
	|  1    | 3.748    | 0.620  | 3.576   | 0.432  | 4.006    | 0.395  | 3.446   | 0.787    |
	+-------+----------+--------+---------+--------+----------+--------+---------+----------+

    Note that this test was run just to show the interference effect
    over-subscription can have in baseline

    c) schbench results with 2 message groups on 8 vCPU VMs

	+-----------+-------+---------------+--------------+------------+
	|           |       | Paravirt      | Base         |            |
	+-----------+-------+-------+-------+-------+------+------------+
	|           |Threads| VM1   | VM2   |  VM1  | VM2  |%Improvement|
	+-----------+-------+-------+-------+-------+------+------------+
	|50.0000th  |    1  | 52    | 53    |  58   | 54   |  +6.25%    |
	|75.0000th  |    1  | 69    | 61    |  83   | 59   |  +8.45%    |
	|90.0000th  |    1  | 80    | 80    |  89   | 83   |  +6.98%    |
	|95.0000th  |    1  | 83    | 83    |  93   | 87   |  +7.78%    |
	|*99.0000th |    1  | 92    | 94    |  99   | 97   |  +5.10%    |
	|99.5000th  |    1  | 95    | 100   |  102  | 103  |  +4.88%    |
	|99.9000th  |    1  | 107   | 123   |  105  | 203  |  +25.32%   |
	+-----------+-------+-------+-------+-------+------+------------+
	|50.0000th  |    2  | 56    | 62    |  67   | 59   |  +6.35%    |
	|75.0000th  |    2  | 69    | 75    |  80   | 71   |  +4.64%    |
	|90.0000th  |    2  | 80    | 82    |  90   | 81   |  +5.26%    |
	|95.0000th  |    2  | 85    | 87    |  97   | 91   |  +8.51%    |
	|*99.0000th |    2  | 98    | 99    |  107  | 109  |  +8.79%    |
	|99.5000th  |    2  | 107   | 105   |  109  | 116  |  +5.78%    |
	|99.9000th  |    2  | 9968  | 609   |  875  | 3116 | -165.02%   |
	+-----------+-------+-------+-------+-------+------+------------+
	|50.0000th  |    4  | 78    | 77    |  78   | 79   |  +1.27%    |
	|75.0000th  |    4  | 98    | 106   |  100  | 104  |   0.00%    |
	|90.0000th  |    4  | 987   | 1001  |  995  | 1015 |  +1.09%    |
	|95.0000th  |    4  | 4136  | 5368  |  5752 | 5192 |  +13.16%   |
	|*99.0000th |    4  | 11632 | 11344 |  11024| 10736|  -5.59%    |
	|99.5000th  |    4  | 12624 | 13040 |  12720| 12144|  -3.22%    |
	|99.9000th  |    4  | 13168 | 18912 |  14992| 17824|  +2.24%    |
	+-----------+-------+-------+-------+-------+------+------------+

    Note: Improvement is measured for (VM1+VM2)

Signed-off-by: Rohit Jain <rohit.k.jain@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dhaval.giani@oracle.com
Cc: matt@codeblueprint.co.uk
Cc: steven.sistare@oracle.com
Cc: subhra.mazumdar@oracle.com
Link: http://lkml.kernel.org/r/1525294330-7759-1-git-send-email-rohit.k.jain@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-04 10:00:09 +02:00
..
bpf bpf: sockmap remove dead check 2018-04-20 22:09:51 +02:00
cgroup Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-04-03 18:00:13 -07:00
configs KVM changes for 4.16 2018-02-10 13:16:35 -08:00
debug * Fix 2032 time access issues and new compiler warnings 2018-04-12 10:21:19 -07:00
events Various fixes in tracing: 2018-05-02 17:38:37 -10:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq/affinity: Spread irq vectors among present CPUs as far as possible 2018-04-06 12:19:51 +02:00
livepatch livepatch: Allow to call a custom callback when freeing shadow variables 2018-04-17 13:42:48 +02:00
locking locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches 2018-03-31 07:30:50 +02:00
power PM / QoS: mark expected switch fall-throughs 2018-04-09 13:49:40 +02:00
printk New features: 2018-04-10 11:27:30 -07:00
rcu Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD 2018-02-23 15:15:41 -08:00
sched sched/core: Don't schedule threads on pre-empted vCPUs 2018-05-04 10:00:09 +02:00
time Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME 2018-04-26 14:53:32 +02:00
trace Various fixes in tracing: 2018-05-02 17:38:37 -10:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit_fsnotify.c
audit_tree.c audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_watch.c
audit.c audit/stable-4.17 PR 20180403 2018-04-06 15:01:25 -07:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
auditfilter.c audit: deprecate the AUDIT_FILTER_ENTRY filter 2018-02-15 14:36:29 -05:00
auditsc.c audit: bail before bug check if audit disabled 2018-02-15 14:40:25 -05:00
backtracetest.c
bounds.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c 2018-04-02 20:15:32 +02:00
configs.c
context_tracking.c
cpu_pm.c
cpu.c cpu/hotplug: Fix unused function warning 2018-03-15 20:34:40 +01:00
crash_core.c kexec: export PG_swapbacked to VMCOREINFO 2018-04-13 17:10:27 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Account blkio completion on the correct task 2018-01-16 03:29:36 +01:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c get rid of pointless includes of fs_struct.h 2018-02-22 14:28:50 -05:00
exit.c kernel: use kernel_wait4() instead of sys_wait4() 2018-04-02 20:14:51 +02:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c error-injection: Fix to prohibit jump optimization 2018-03-12 16:16:00 +01:00
fork.c fork: unconditionally clear stack on fork 2018-04-20 17:18:35 -07:00
freezer.c
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
futex.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c jump_label: Disable jump labels in __exit code 2018-03-20 08:57:17 +01:00
kallsyms.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: detect double association with a single task 2018-02-06 18:32:46 -08:00
kexec_core.c
kexec_file.c kernel/kexec_file.c: allow archs to set purgatory load address 2018-04-13 17:10:28 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kexec.c kexec: call do_kexec_load() in compat syscall directly 2018-04-02 20:15:01 +02:00
kmod.c
kprobes.c kprobes: Fix random address output of blacklist file 2018-04-25 10:27:56 -04:00
ksysfs.c
kthread.c kthread, sched/wait: Fix kthread_parkme() completion issue 2018-05-03 07:38:05 +02:00
latencytop.c
Makefile error-injection: Support fault injection framework 2018-01-12 17:33:38 -08:00
memremap.c kernel/memremap: Remove stale devres_free() call 2018-03-06 10:58:54 -08:00
module_signing.c
module-internal.h
module.c module: Fix display of wrong module .text address 2018-04-18 22:59:46 +02:00
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c taint: add taint for randstruct 2018-04-11 10:28:35 -07:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid_namespace.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-04-03 19:15:32 -07:00
pid.c xarray: add the xa_lock to the radix_tree_root 2018-04-11 10:28:39 -07:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c kernel/reboot.c: add devm_register_reboot_notifier() 2017-11-17 16:10:04 -08:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-02-21 15:35:43 -08:00
resource.c resource: fix integer overflow at reallocation 2018-04-13 17:10:27 -07:00
seccomp.c - Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen) 2018-02-22 10:50:24 -08:00
signal.c sched/core: Introduce set_special_state() 2018-05-04 07:54:54 +02:00
smp.c smp/core: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:50 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c softirq: Consolidate common code in tasklet_[hi]_action() 2018-03-09 11:50:55 +01:00
stacktrace.c
stop_machine.c stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock 2018-05-03 07:38:03 +02:00
sys_ni.c syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls 2018-04-05 16:59:38 +02:00
sys.c kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() 2018-04-02 20:16:06 +02:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
sysctl.c kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param 2018-04-11 10:28:38 -07:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Disable the jprobes test code 2017-10-20 11:02:54 +02:00
torture.c torture: Save a line in stutter_wait(): while -> for 2017-12-11 09:18:30 -08:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-04-30 12:09:56 -04:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c kernel: use kernel_wait4() instead of sys_wait4() 2018-04-02 20:14:51 +02:00
up.c
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-11-16 12:20:15 -08:00
user-return-notifier.c
user.c efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
utsname_sysctl.c
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
watchdog_hld.c Merge branch 'linus' into core/urgent, to pick up dependent commits 2017-11-04 08:53:04 +01:00
watchdog.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00
workqueue.c Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-04-03 18:00:13 -07:00