linux/kernel
Jiri Olsa 4f0336a8a0 perf/ring_buffer: Prevent concurent ring buffer access
[ Upstream commit cd6fb677ce ]

Some of the scheduling tracepoints allow the perf_tp_event
code to write to ring buffer under different cpu than the
code is running on.

This results in corrupted ring buffer data demonstrated in
following perf commands:

  # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.383 [sec]
  [ perf record: Woken up 8 times to write data ]
  0x42b890 [0]: failed to process type: -1765585640
  [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ]

  # perf report --stdio
  0x42b890 [0]: failed to process type: -1765585640

The reason for the corruption are some of the scheduling tracepoints,
that have __perf_task dfined and thus allow to store data to another
cpu ring buffer:

  sched_waking
  sched_wakeup
  sched_wakeup_new
  sched_stat_wait
  sched_stat_sleep
  sched_stat_iowait
  sched_stat_blocked

The perf_tp_event function first store samples for current cpu
related events defined for tracepoint:

    hlist_for_each_entry_rcu(event, head, hlist_entry)
      perf_swevent_event(event, count, &data, regs);

And then iterates events of the 'task' and store the sample
for any task's event that passes tracepoint checks:

  ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

  list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
    if (event->attr.type != PERF_TYPE_TRACEPOINT)
      continue;
    if (event->attr.config != entry->type)
      continue;

    perf_swevent_event(event, count, &data, regs);
  }

Above code can race with same code running on another cpu,
ending up with 2 cpus trying to store under the same ring
buffer, which is specifically not allowed.

This patch prevents the problem, by allowing only events with the same
current cpu to receive the event.

NOTE: this requires the use of (per-task-)per-cpu buffers for this
feature to work; perf-record does this.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
[peterz: small edits to Changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: e6dab5ffab ("perf/trace: Add ability to set a target task for events")
Link: http://lkml.kernel.org/r/20180923161343.GB15054@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10 07:41:34 -08:00
..
bpf bpf: fix references to free_bpf_prog_info() in comments 2018-08-06 16:24:37 +02:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-24 10:07:42 +02:00
debug kdb: make "mdr" command repeat 2018-05-30 07:49:17 +02:00
events perf/ring_buffer: Prevent concurent ring buffer access 2018-11-10 07:41:34 -08:00
gcov gcov: disable for COMPILE_TEST 2018-01-23 19:50:10 +01:00
irq genirq: Delay incrementing interrupt count if it's disabled/pending 2018-09-15 09:40:40 +02:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking locking/osq_lock: Fix osq_lock queue corruption 2018-09-19 22:48:56 +02:00
power PM / sleep: wakeup: Fix build error caused by missing SRCU support 2018-09-09 20:04:34 +02:00
printk braille-console: Fix value returned by _braille_console_setup 2018-03-22 09:23:23 +01:00
rcu rcu: Allow for page faults in NMI handlers 2017-10-18 09:20:41 +02:00
sched sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning 2018-05-30 07:49:08 +02:00
time alarmtimer: Prevent overflow for relative nanosleep 2018-10-10 08:52:05 +02:00
trace ring-buffer: Allow for rescheduling when removing pages 2018-09-29 03:08:52 -07:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:27:08 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:09:45 +01:00
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:35:08 +02:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:49:16 +02:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
auditfilter.c audit: allow not equal op for audit by executable 2018-08-06 16:24:38 +02:00
auditsc.c audit: allow not equal op for audit by executable 2018-08-06 16:24:38 +02:00
backtracetest.c
bounds.c
capability.c exec: Ensure mm->user_ns contains the execed files 2017-01-06 11:16:14 +01:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
cgroup.c cgroup: Fix deadlock in cpu hotplug path 2018-10-13 09:11:33 +02:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpu.c stable-fixup: hotplug: fix unused function warning 2017-01-12 11:22:48 +01:00
cpuset.c sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:27:35 +02:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-09-15 08:27:49 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c Remove rest of exec domains. 2015-04-12 21:03:31 +02:00
exit.c kernel/exit.c: avoid undefined behaviour when calling wait4() 2018-05-26 08:48:51 +02:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:44:56 +02:00
fork.c kthread: Fix use-after-free if kthread fork fails 2018-09-19 22:48:55 +02:00
freezer.c
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
futex.c futex: futex_wake_op, fix sign_extend32 sign bits 2018-05-26 08:48:51 +02:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
hung_task.c kernel/hung_task.c: change hung_task.c to use for_each_process_thread() 2015-04-15 16:35:22 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-16 10:33:55 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS 2015-05-12 09:46:00 +02:00
Kconfig.preempt
kexec_core.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-24 10:07:36 +02:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kexec.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c kprobes: Make list and blacklist root user read only 2018-09-05 09:18:40 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-09-09 20:04:34 +02:00
latencytop.c
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:37:26 +01:00
memremap.c mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} 2017-01-19 20:17:18 +01:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
module-internal.h
module.c module: exclude SHN_UNDEF symbols from kallsyms api 2018-10-10 08:52:07 +02:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c bury struct proc_ns in fs/proc 2014-12-04 14:34:54 -05:00
padata.c padata: free correct variable 2017-05-20 14:27:02 +02:00
panic.c kernel/panic.c: add missing \n 2017-07-05 14:37:19 +02:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-25 14:30:11 +02:00
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2018-04-13 19:50:03 +02:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2018-02-25 11:03:44 +01:00
ptrace.c ptrace: Properly initialize ptracer_cred on fork 2017-06-14 13:16:20 +02:00
range.c kernel: avoid overflow in cmp_range 2015-01-17 10:02:23 +13:00
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:49:00 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:32:05 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-07-25 10:18:27 +02:00
signal.c kernel/signal.c: avoid undefined behaviour in kill_something_info 2018-05-30 07:48:52 +02:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 15:24:03 -08:00
stacktrace.c stacktrace: introduce snprint_stack_trace for buffer output 2014-12-13 12:42:48 -08:00
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sys.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:04:35 +02:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
sysctl.c sched/sysctl: Check user input value of sysctl_sched_time_avg 2018-09-05 09:18:33 +02:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-16 10:06:47 +02:00
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
up.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 20:04:35 +02:00
user-return-notifier.c
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2014-12-17 12:31:40 -08:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:04:35 +02:00
utsname.c copy address of proc_ns_ops into ns_common 2014-12-04 14:34:47 -05:00
watchdog.c kernel/watchdog: use nmi registers snapshot in hardlockup handler 2017-01-06 11:16:16 +01:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 17:13:11 +01:00
workqueue.c workqueue: use put_device() instead of kfree() 2018-05-30 07:49:04 +02:00