perf/core: Try to allocate task_ctx_data quickly

The attach_global_ctx_data() has O(N^2) algorithm to allocate the
context data for each thread.  This caused perfomance problems on large
systems with O(100k) threads.

Because kmalloc(GFP_KERNEL) can go sleep it cannot be called under the
RCU lock.  So let's try with GFP_NOWAIT first so that it can proceed in
normal cases.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260211223222.3119790-3-namhyung@kernel.org
This commit is contained in:
Namhyung Kim 2026-02-11 14:32:20 -08:00 committed by Peter Zijlstra
parent 28c75fbfec
commit bec2ee2390

View File

@ -5489,6 +5489,12 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
cd = NULL;
}
if (!cd) {
/*
* Try to allocate context quickly before
* traversing the whole thread list again.
*/
if (!attach_task_ctx_data(p, ctx_cache, true, GFP_NOWAIT))
continue;
get_task_struct(p);
goto alloc;
}