linux/kernel
David Hildenbrand 6e3092d788 kernel/events/uprobes: uprobe_write_opcode() rewrite
uprobe_write_opcode() does some pretty low-level things that really, it
shouldn't be doing: for example, manually breaking COW by allocating
anonymous folios and replacing mapped pages.

Further, it does seem to do some shaky things: for example, writing to
possible COW-shared anonymous pages or zapping anonymous pages that might
be pinned.  We're also not taking care of uffd, uffd-wp, softdirty ... 
although rather corner cases here.  Let's just get it right like ordinary
ptrace writes would.

Let's rewrite the code, leaving COW-breaking to core-MM, triggered by
FOLL_FORCE|FOLL_WRITE (note that the code was already using FOLL_FORCE).

We'll use GUP to lookup/faultin the page and break COW if required.  Then,
we'll walk the page tables using a folio_walk to perform our page
modification atomically by temporarily unmap the PTE + flushing the TLB.

Likely, we could avoid the temporary unmap in case we can just atomically
write the instruction, but that will be a separate project.

Unfortunately, we still have to implement the zapping logic manually,
because we only want to zap in specific circumstances (e.g., page content
identical).

Note that we can now handle large folios (compound pages) and the shared
zeropage just fine, so drop these checks.

Link: https://lkml.kernel.org/r/20250321113713.204682-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: tongtiangen <tongtiangen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:18 -07:00
..
bpf bpf: Add namespace to BPF internal symbols 2025-04-25 09:21:23 -07:00
cgroup cgroup/cpuset-v1: Add missing support for cpuset_v2_mode 2025-04-17 07:32:53 -10:00
configs - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
debug TTY/Serial driver updates for 6.15-rc1 2025-04-02 18:17:33 -07:00
dma dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask 2025-04-22 17:44:09 +02:00
entry Objtool changes for v6.15: 2025-03-24 21:18:05 -07:00
events kernel/events/uprobes: uprobe_write_opcode() rewrite 2025-05-11 17:48:18 -07:00
futex
gcov
irq genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show() 2025-04-30 23:25:10 +02:00
kcsan treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
livepatch Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
locking - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
module ring-buffer updates for v6.15 2025-03-31 13:37:22 -07:00
power This update includes the following changes: 2025-03-29 10:01:55 -07:00
printk printk changes for 6.15 2025-03-27 19:22:24 -07:00
rcu treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
sched Fix sporadic crashes in dequeue_entities() due to ... bad math. 2025-04-26 09:23:20 -07:00
time timekeeping: Prevent coarse clocks going backwards 2025-04-28 11:17:29 +02:00
trace tracing: Do not take trace_event_sem in print_event_fields() 2025-05-01 22:44:52 -04:00
.gitignore
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c fs: add kern_path_locked_negative() 2025-04-15 11:32:34 +02:00
audit.c
audit.h
auditfilter.c
auditsc.c fs: dedup handling of struct filename init and refcounts bumps 2025-03-18 15:34:27 +01:00
backtracetest.c
bounds.c
capability.c capability: Remove unused has_capability 2025-03-07 22:03:09 -06:00
cfi.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Make RCU watch ct_kernel_exit_state() warning 2025-03-04 18:44:29 -08:00
cpu_pm.c
cpu.c hyperv-next for 6.15 2025-03-25 14:47:04 -07:00
crash_core.c
crash_reserve.c crash: remove an unused argument from reserve_crashkernel_generic() 2025-03-16 22:30:47 -07:00
cred.c
delayacct.c
dma.c
elfcorehdr.c
exec_domain.c
exit.c exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit() 2025-03-25 15:56:22 +01:00
exit.h
extable.c
fail_function.c
fork.c kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork() 2025-05-11 17:26:06 -07:00
freezer.c
gen_kheaders.sh Revert "kheaders: Ignore silly-rename files" 2025-03-15 21:22:52 +09:00
groups.c
hung_task.c hung_task: show the blocker task if the task is hung on mutex 2025-03-21 22:10:04 -07:00
iomem.c
irq_work.c
jump_label.c jump_label: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
kallsyms_internal.h
kallsyms_selftest.c
kallsyms_selftest.h
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-03-16 22:30:47 -07:00
kexec_file.c crash: let arch decide usable memory range in reserved area 2025-03-16 22:30:47 -07:00
kexec_internal.h
kexec.c
kheaders.c
kprobes.c kprobes: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
ksyms_common.c
ksysfs.c
kthread.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
latencytop.c
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
module_signature.c
notifier.c
nsproxy.c
padata.c
panic.c These are objtool fixes and updates by Josh Poimboeuf, centered 2025-04-02 10:30:10 -07:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-05-07 20:24:59 +02:00
pid_namespace.c pid: Do not set pid_max in new pid namespaces 2025-03-06 10:18:36 +01:00
pid_sysctl.h
pid.c kernel-6.15-rc1.tasklist_lock 2025-03-24 13:39:27 -07:00
profile.c
ptrace.c ptrace: introduce PTRACE_SET_SYSCALL_INFO request 2025-05-11 17:48:15 -07:00
range.c
reboot.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
regset.c
relay.c relay: use kasprintf() instead of fixed buffer formatting 2025-03-21 22:10:05 -07:00
resource_kunit.c
resource.c resource: replace open coded variant of DEFINE_RES() 2025-03-21 22:10:05 -07:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-03-06 22:26:49 +01:00
scftorture.c
scs.c
seccomp.c
signal.c vfs-6.15-rc1.fixes 2025-04-02 16:05:21 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c lockdep: Fix wait context check on softirq for PREEMPT_RT 2025-03-25 10:46:44 +01:00
stackleak.c
stacktrace.c
static_call_inline.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
static_call.c
stop_machine.c stop-machine: Add comment for rcu_momentary_eqs() 2025-03-11 10:15:52 -07:00
sys_ni.c
sys.c Updates for the core time/timer subsystem: 2025-03-25 10:33:23 -07:00
sysctl-test.c
sysctl.c s390 updates for 6.15 merge window 2025-03-29 11:59:43 -07:00
task_work.c
taskstats.c
torture.c
tracepoint.c tracepoint: Print the function symbol when tracepoint_debug is set 2025-03-21 15:30:10 -04:00
tsacct.c
ucount.c ucount: use rcuref_t for reference counting 2025-03-16 22:30:50 -07:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
vhost_task.c vhost_task: fix vhost_task_create() documentation 2025-04-18 10:08:11 -04:00
vmcore_info.c
watch_queue.c vfs-6.15-rc1.pipe 2025-03-24 09:52:37 -07:00
watchdog_buddy.c
watchdog_perf.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
watchdog.c A treewide hrtimer timer cleanup 2025-03-25 10:54:15 -07:00
workqueue_internal.h
workqueue.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00