linux/kernel
Steven Rostedt edca33a562 tracing: Fix failure to read user space from system call trace events
The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.

There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:

 pi_mutex_hammer-1375    [006] d..21   138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
 pi_mutex_hammer-1375    [006] d..21   138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95

What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.

Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.

To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21 16:42:36 -04:00
..
bpf bpf-fixes 2026-03-07 12:20:37 -08:00
cgroup cgroup: Don't expose dead tasks in cgroup 2026-03-06 12:43:25 -10:00
configs Remove WARN_ALL_UNSEEDED_RANDOM kernel config option 2026-02-23 11:18:48 -08:00
debug treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
dma dma-mapping: avoid random addr value print out on error path 2026-02-23 08:26:54 +01:00
entry Merge branch 'core/entry' into sched/core 2026-01-30 15:40:05 +01:00
events perf: Fix __perf_event_overflow() vs perf_remove_from_context() race 2026-02-25 15:02:34 +01:00
futex Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
gcov Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
irq Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kcsan kcsan: test: Adjust "expect" allocation type for kmalloc_obj 2026-02-26 09:54:08 -08:00
livepatch Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
liveupdate liveupdate: luo_file: remember retrieve() status 2026-02-24 11:13:26 -08:00
locking Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
module module: Fix kernel panic when a symbol st_shndx is out of bounds 2026-02-23 19:37:28 +00:00
power Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
printk Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
rcu Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
sched More MM-CID fixes, mostly fixing hangs/races: 2026-03-15 10:49:47 -07:00
time time/jiffies: Mark jiffies_64_to_clock_t() notrace 2026-03-11 10:33:12 +01:00
trace tracing: Fix failure to read user space from system call trace events 2026-03-21 16:42:36 -04:00
unwind Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
.gitignore
acct.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
async.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_fsnotify.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_tree.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_watch.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit.h audit: fix comment misindentation in audit.h 2025-10-22 19:28:06 -04:00
auditfilter.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
auditsc.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
backtracetest.c
bounds.c x86/asm: Remove ANNOTATE_DATA_SPECIAL usage 2025-12-03 16:53:19 +01:00
capability.c
cfi.c
compat.c
configs.c
context_tracking.c context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() 2026-01-01 16:39:46 +08:00
cpu_pm.c syscore: Pass context data to callbacks 2025-11-14 10:01:52 +01:00
cpu.c SPDX updates for 7.0-rc1 2026-02-17 09:46:03 -08:00
crash_core_test.c
crash_core.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
crash_dump_dm_crypt.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
crash_reserve.c crash: let architecture decide crash memory export to iomem_resource 2025-11-12 10:00:15 -08:00
cred.c cred: remove unused set_security_override_from_ctx() 2026-01-06 20:52:57 -05:00
delayacct.c delayacct: fix uapi timespec64 definition 2026-02-08 00:13:32 -08:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c kthread: consolidate kthread exit paths to prevent use-after-free 2026-02-26 10:45:49 +01:00
exit.h
extable.c
fail_function.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
fork.c sched/mmcid: Avoid full tasklist walks 2026-03-11 12:01:07 +01:00
freezer.c freezer: Clarify that only cgroup1 freezer uses PM freezer 2025-10-30 20:10:27 +01:00
gen_kheaders.sh
groups.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
hung_task.c hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung 2025-11-20 14:03:43 -08:00
iomem.c
irq_work.c
jump_label.c
kallsyms_internal.h kallsyms: Get rid of kallsyms relative base 2026-01-22 15:58:22 -07:00
kallsyms_selftest.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kallsyms_selftest.h
kallsyms.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec liveupdate: kho: move to kernel/liveupdate 2025-11-27 14:24:33 -08:00
Kconfig.locks
Kconfig.preempt sched: Further restrict the preemption modes 2026-01-08 12:43:57 +01:00
kcov.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kexec_core.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kexec_elf.c
kexec_file.c kexec: derive purgatory entry from symbol 2026-01-31 16:16:07 -08:00
kexec_internal.h
kexec.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kheaders.c
kprobes.c kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() 2026-03-13 23:15:26 +09:00
kstack_erase.c sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument 2025-11-27 15:44:53 +01:00
ksyms_common.c
ksysfs.c kexec: move sysfs entries to /sys/kernel/kexec 2025-11-27 14:24:42 -08:00
kthread.c kthread: consolidate kthread exit paths to prevent use-after-free 2026-02-26 10:45:49 +01:00
latencytop.c
Makefile kcov: Enable context analysis 2026-01-05 16:43:34 +01:00
module_signature.c
notifier.c
nscommon.c nsfs: tighten permission checks for ns iteration ioctls 2026-02-27 22:00:08 +01:00
nsproxy.c nsproxy: fix free_nsproxy() and simplify create_new_namespaces() 2025-11-14 13:10:38 +01:00
nstree.c nstree: tighten permission checks for listing 2026-02-27 22:00:11 +01:00
padata.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
panic.c panic: add panic_force_cpu= parameter to redirect panic to a specific CPU 2026-02-03 08:21:26 -08:00
params.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
pid_namespace.c pid: rely on common reference count behavior 2025-11-11 10:01:32 +01:00
pid_sysctl.h
pid.c Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers" 2026-02-10 11:39:30 +01:00
profile.c
ptrace.c rseq: Introduce struct rseq_data 2025-11-04 08:30:50 +01:00
range.c
reboot.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
regset.c
relay.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource_kunit.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
rseq.c rseq: slice ext: Ensure rseq feature size differs from original rseq size 2026-02-23 11:19:19 +01:00
scftorture.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
scs.c scs: fix a wrong parameter in __scs_magic 2025-11-12 10:00:13 -08:00
seccomp.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
signal.c compiler-context-analysis: Remove __cond_lock() function-like helper 2026-01-05 16:43:33 +01:00
smp.c smp: Introduce a helper function to check for pending IPIs 2025-11-19 18:06:50 +01:00
smpboot.c
smpboot.h
softirq.c
stacktrace.c
static_call_inline.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
static_call.c
stop_machine.c
sys_ni.c rseq: Implement sys_rseq_slice_yield() 2026-01-22 11:11:17 +01:00
sys.c RISC-V updates for v7.0 2026-02-12 19:17:44 -08:00
sysctl-test.c
sysctl.c sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions 2026-01-06 11:27:10 +01:00
task_work.c task_work: Fix NMI race condition 2025-10-29 10:29:54 +01:00
taskstats.c
torture.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
tracepoint.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
tsacct.c tsacct: skip all kernel threads 2026-01-26 19:07:13 -08:00
ucount.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
uid16.c
uid16.h
umh.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
up.c
user_namespace.c Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
user-return-notifier.c
user.c ns: drop custom reference count initialization for initial namespaces 2025-11-11 10:01:32 +01:00
utsname_sysctl.c
utsname.c namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
vhost_task.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
vmcore_info.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
watch_queue.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
watchdog_buddy.c
watchdog_perf.c watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency 2026-02-08 00:13:35 -08:00
watchdog.c watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() 2026-02-08 00:13:34 -08:00
workqueue_internal.h workqueue: Show in-flight work item duration in stall diagnostics 2026-03-05 07:27:48 -10:00
workqueue.c workqueue: Rename show_cpu_pool{s,}_hog{s,}() to reflect broadened scope 2026-03-06 06:38:16 -10:00