linux/arch/arm64/kernel
Valentin Schneider f3dc5ac401 arm64: topology: Stop using MPIDR for topology information
[ Upstream commit 3102bc0e6a ]

In the absence of ACPI or DT topology data, we fallback to haphazardly
decoding *something* out of MPIDR. Sadly, the contents of that register are
mostly unusable due to the implementation leniancy and things like Aff0
having to be capped to 15 (despite being encoded on 8 bits).

Consider a simple system with a single package of 32 cores, all under the
same LLC. We ought to be shoving them in the same core_sibling mask, but
MPIDR is going to look like:

  | CPU  | 0 | ... | 15 | 16 | ... | 31 |
  |------+---+-----+----+----+-----+----+
  | Aff0 | 0 | ... | 15 |  0 | ... | 15 |
  | Aff1 | 0 | ... |  0 |  1 | ... |  1 |
  | Aff2 | 0 | ... |  0 |  0 | ... |  0 |

Which will eventually yield

  core_sibling(0-15)  == 0-15
  core_sibling(16-31) == 16-31

NUMA woes
=========

If we try to play games with this and set up NUMA boundaries within those
groups of 16 cores via e.g. QEMU:

  # Node0: 0-9; Node1: 10-19
  $ qemu-system-aarch64 <blah> \
    -smp 20 -numa node,cpus=0-9,nodeid=0 -numa node,cpus=10-19,nodeid=1

The scheduler's MC domain (all CPUs with same LLC) is going to be built via

  arch_topology.c::cpu_coregroup_mask()

In there we try to figure out a sensible mask out of the topology
information we have. In short, here we'll pick the smallest of NUMA or
core sibling mask.

  node_mask(CPU9)    == 0-9
  core_sibling(CPU9) == 0-15

MC mask for CPU9 will thus be 0-9, not a problem.

  node_mask(CPU10)    == 10-19
  core_sibling(CPU10) == 0-15

MC mask for CPU10 will thus be 10-19, not a problem.

  node_mask(CPU16)    == 10-19
  core_sibling(CPU16) == 16-19

MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two
different unique MC spans, and the scheduler has no idea what to make of
that. That triggers the WARN_ON() added by commit

  ccf74128d6 ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")

Fixing MPIDR-derived topology
=============================

We could try to come up with some cleverer scheme to figure out which of
the available masks to pick, but really if one of those masks resulted from
MPIDR then it should be discarded because it's bound to be bogus.

I was hoping to give MPIDR a chance for SMT, to figure out which threads are
in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out
to me that there are systems out there where *all* cores have non-zero
values in their higher affinity fields (e.g. RK3288 has "5" in all of its
cores' MPIDR.Aff1), which would expose a bogus core ID to userspace.

Stop using MPIDR for topology information. When no other source of topology
information is available, mark each CPU as its own core and its NUMA node
as its LLC domain.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20200829130016.26106-1-valentin.schneider@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05 11:08:41 +01:00
..
probes arm64: debug: Ensure debug handlers check triggering exception level 2019-03-23 20:10:08 +01:00
vdso arm64: vdso: Fix clock_getres() for CLOCK_REALTIME 2019-05-31 06:46:21 -07:00
.gitignore
acpi_numa.c arm64: numa: rework ACPI NUMA initialization 2018-07-09 18:21:40 +01:00
acpi_parking_protocol.c
acpi.c acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 2019-07-26 09:14:10 +02:00
alternative.c arm64/alternatives: don't patch up internal branches 2020-07-22 09:32:01 +02:00
arm64ksyms.c arm64: lib: use C string functions with KASAN enabled 2019-12-01 09:17:01 +01:00
armv8_deprecated.c arm64: armv8_deprecated: Fix undef_hook mask for thumb setend 2020-04-17 10:48:50 +02:00
asm-offsets.c arm64: vdso: Fix clock_getres() for CLOCK_REALTIME 2019-05-31 06:46:21 -07:00
cacheinfo.c arm64: Add support for ACPI based firmware tables 2018-05-17 17:28:09 +01:00
cpu_errata.c arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs 2020-11-05 11:08:31 +01:00
cpu_ops.c arm64: cpu_ops: fix a leaked reference by adding missing of_node_put 2019-05-31 06:46:27 -07:00
cpu-reset.h arm64: kexec: always reset to EL2 if present 2018-07-04 18:34:24 +01:00
cpu-reset.S
cpufeature.c arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register 2020-10-01 13:14:45 +02:00
cpuidle.c
cpuinfo.c arm64: cpufeature: Detect SSBS and advertise to userspace 2019-10-11 18:21:10 +02:00
crash_core.c arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE 2018-09-11 11:08:49 +01:00
crash_dump.c
debug-monitors.c arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP 2020-07-29 10:16:54 +02:00
efi-entry.S
efi-header.S
efi-rt-wrapper.S
efi.c
entry-fpsimd.S arm64/sve: Write ZCR_EL1 on context switch only if changed 2018-05-17 18:19:53 +01:00
entry-ftrace.S arm64: ftrace: don't adjust the LR value 2019-02-12 19:47:04 +01:00
entry.S Revert "arm64: preempt: Fix big-endian when checking preempt count in assembly" 2019-12-21 10:57:20 +01:00
fpsimd.c arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n 2020-06-30 23:17:15 -04:00
ftrace.c arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419 2019-11-06 13:05:47 +01:00
head.S arm64: Fix size of __early_cpu_boot_status 2020-04-13 10:45:16 +02:00
hibernate-asm.S
hibernate.c arm64: hibernate: check pgd table allocation 2020-01-27 14:51:19 +01:00
hw_breakpoint.c arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints 2020-06-25 15:33:06 +02:00
hyp-stub.S arm64: hyp-stub: Forbid kprobing of the hyp-stub 2019-02-06 17:30:12 +01:00
image.h arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly 2019-07-26 09:13:56 +02:00
insn.c arm64: insn: Fix two bugs in encoding 32-bit logical immediates 2020-06-22 09:05:10 +02:00
io.c
irq.c arm64: Fix HCR.TGE status for NMI contexts 2019-03-23 20:10:08 +01:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kaslr.c arm64/kernel: kaslr: reduce module randomization range to 2 GB 2019-05-31 06:46:01 -07:00
kgdb.c arm64: kgdb: Fix single-step exception handling oops 2020-07-16 08:17:24 +02:00
kuser32.S
machine_kexec.c arm64: fix the flush_icache_range arguments in machine_kexec 2020-05-20 08:18:43 +02:00
Makefile arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE 2018-09-11 11:08:49 +01:00
module-plts.c arm64/kernel: rename module_emit_adrp_veneer->module_emit_veneer_for_adrp 2018-04-24 19:07:35 +01:00
module.c arm64: kaslr: keep modules inside module region when KASAN is enabled 2019-07-10 09:53:43 +02:00
module.lds
paravirt.c
pci.c
perf_callchain.c
perf_event.c arm64: perf: set suppress_bind_attrs flag to true 2019-01-26 09:32:35 +01:00
perf_regs.c arm64: perf: Report the PC value in REGS_ABI_32 mode 2020-06-30 23:17:17 -04:00
process.c arm64: ssbs: Fix context-switch when SSBS is present on all CPUs 2020-02-19 19:51:56 +01:00
psci.c arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill() 2019-12-31 16:35:26 +01:00
ptrace.c arm64: ptrace: Consistently use pseudo-singlestep exceptions 2020-07-22 09:32:12 +02:00
reloc_test_core.c
reloc_test_syms.S
relocate_kernel.S
return_address.c arm64: unwind: Prohibit probing on return_address() 2019-08-25 10:47:56 +02:00
sdei.c arm64: fix wrong check of on_sdei_stack in nmi context 2019-05-08 07:21:49 +02:00
setup.c arm64: Fix /proc/iomem for reserved but not memory regions 2018-10-12 15:25:16 +01:00
signal.c arm64: ptrace: Consistently use pseudo-singlestep exceptions 2020-07-22 09:32:12 +02:00
signal32.c arm64: use {COMPAT,}SYSCALL_DEFINE0 for sigreturn 2018-07-12 14:49:48 +01:00
sleep.S
smccc-call.S
smp_spin_table.c
smp.c arm64: smp: fix crash_smp_send_stop() behaviour 2020-03-25 08:06:15 +01:00
ssbd.c arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3 2019-10-11 18:21:30 +02:00
stacktrace.c arm64: unwind: Prohibit probing on return_address() 2019-08-25 10:47:56 +02:00
suspend.c arm64: ssbd: Restore mitigation status on CPU resume 2018-05-31 17:35:19 +01:00
sys_compat.c arm64: Silence clang warning on mismatched value/register sizes 2020-04-29 16:31:09 +02:00
sys.c arm64: use the correct function type for __arm64_sys_ni_syscall 2019-06-22 08:15:20 +02:00
sys32.c arm64: use the correct function type for __arm64_sys_ni_syscall 2019-06-22 08:15:20 +02:00
syscall.c arm64: compat: Ensure upper 32 bits of x0 are zero on syscall return 2020-07-22 09:32:13 +02:00
time.c
topology.c arm64: topology: Stop using MPIDR for topology information 2020-11-05 11:08:41 +01:00
trace-events-emulation.h
traps.c arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419 2020-04-29 16:31:09 +02:00
vdso.c arm64/vdso: don't leak kernel addresses 2020-01-27 14:50:36 +01:00
vmlinux.lds.S KVM: arm64: Add kvm_extable for vaxorcism code 2020-09-09 19:04:31 +02:00