linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Thomas Gleixner	010b7723c0	rseq: Don't advertise time slice extensions if disabled If time slice extensions have been disabled on the kernel command line, then advertising them in RSEQ flags is wrong. Adjust the conditionals to reflect reality, fixup the misleading comments about the gap of these flags and the rseq::flags field. Fixes: `d6200245c7` ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.437059375%40kernel.org Cc: stable@vger.kernel.org	2026-05-01 21:32:20 +02:00
Thomas Gleixner	2cb68e4512	rseq: Set rseq::cpu_id_start to 0 on unregistration The RSEQ rework changed that to RSEQ_CPU_UNINITILIZED, which is obviously incompatible. Revert back to the original behavior. Fixes: `0f085b4188` ("rseq: Provide and use rseq_set_ids()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.271566313%40kernel.org Cc: stable@vger.kernel.org	2026-05-01 21:32:20 +02:00
Zicheng Qu	3da56dc063	sched/fair: Clear rel_deadline when initializing forked entities A yield-triggered crash can happen when a newly forked sched_entity enters the fair class with se->rel_deadline unexpectedly set. The failing sequence is: 1. A task is forked while se->rel_deadline is still set. 2. __sched_fork() initializes vruntime, vlag and other sched_entity state, but does not clear rel_deadline. 3. On the first enqueue, enqueue_entity() calls place_entity(). 4. Because se->rel_deadline is set, place_entity() treats se->deadline as a relative deadline and converts it to an absolute deadline by adding the current vruntime. 5. However, the forked entity's deadline is not a valid inherited relative deadline for this new scheduling instance, so the conversion produces an abnormally large deadline. 6. If the task later calls sched_yield(), yield_task_fair() advances se->vruntime to se->deadline. 7. The inflated vruntime is then used by the following enqueue path, where the vruntime-derived key can overflow when multiplied by the entity weight. 8. This corrupts cfs_rq->sum_w_vruntime, breaks EEVDF eligibility calculation, and can eventually make all entities appear ineligible. pick_next_entity() may then return NULL unexpectedly, leading to a later NULL dereference. A captured trace shows the effect clearly. Before yield, the entity's vruntime was around: 9834017729983308 After yield_task_fair() executed: se->vruntime = se->deadline the vruntime jumped to: 19668035460670230 and the deadline was later advanced further to: 19668035463470230 This shows that the deadline had already become abnormally large before yield_task_fair() copied it into vruntime. rel_deadline is only meaningful when se->deadline really carries a relative deadline that still needs to be placed against vruntime. A freshly forked sched_entity should not inherit or retain this state. Clear se->rel_deadline in __sched_fork(), together with the other sched_entity runtime state, so that the first enqueue does not interpret the new entity's deadline as a stale relative deadline. Fixes: `82e9d0456e` ("sched/fair: Avoid re-setting virtual deadline on 'migrations'") Analyzed-by: Hui Tang <tanghui20@huawei.com> Analyzed-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260424071113.1199600-1-quzicheng@huawei.com	2026-04-28 09:19:54 +02:00
Vincent Guittot	ac8e69e693	sched/fair: Fix wakeup_preempt_fair() vs delayed dequeue Similar to how pick_next_entity() must dequeue delayed entities, so too must wakeup_preempt_fair(). Any delayed task being found means it is eligible and hence past the 0-lag point, ready for removal. Worse, by not removing delayed entities from consideration, it can skew the preemption decision, with the end result that a short slice wakeup will not result in a preemption. tip/sched/core tip/sched/core +this patch cyclictest slice (ms) (default)2.8 8 8 hackbench slice (ms) (default)2.8 20 20 Total Samples \| 22559 22595 22683 Average (us) \| 157 64( 59%) 59( 8%) Median (P50) (us) \| 57 57( 0%) 58(- 2%) 90th Percentile (us) \| 64 60( 6%) 60( 0%) 99th Percentile (us) \| 2407 67( 97%) 67( 0%) 99.9th Percentile (us) \| 3400 2288( 33%) 727( 68%) Maximum (us) \| 5037 9252(-84%) 7461( 19%) Fixes: `f12e148892` ("sched/fair: Prepare pick_next_task() for delayed dequeue") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260422093400.319251-1-vincent.guittot@linaro.org	2026-04-28 09:19:54 +02:00
Peter Zijlstra	c5cd6fd75b	sched/fair: Fix the negative lag increase fix Vincent reported that my rework of his original patch lost a little something. Specifically it got the return value wrong; it should not compare against the old se->vlag, but rather against the current value. Since the thing that matters is if the effective vruntime of an entity is affected and the thing needs repositioning or not. Fixes: `059258b0d4` ("sched/fair: Prevent negative lag increase during delayed dequeue") Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://patch.msgid.link/20260423094107.GT3102624%40noisy.programming.kicks-ass.net	2026-04-28 09:19:54 +02:00
Linus Torvalds	27d128c1cf	ring-buffer fix for 7.1: - Fix accounting of persistent ring buffer rewind On boot up, the head page is moved back to the earliest point of the saved ring buffer. This is because the ring buffer being read by user space on a crash may not save the part it read. Rewinding the head page back to the earliest saved position helps keep those events from being lost. The number of events is also read during boot up and displayed in the stats file in the tracefs directory. It's also used for other accounting as well. On boot up, the "reader page" is accounted for but a rewind may put it back into the buffer and then the reader page may be accounted for again. Save off the original reader page and skip accounting it when scanning the pages in the ring buffer. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaevLRxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qsNBAQCO9T+3kbDLD/DxRBYjeMXhTg2iY2OV SHV9jEzcYTahmwEAqVCFY9sNxVYXait/C924SmHCSi1N0HrIuEzIEd29iwk= =bv+t -----END PGP SIGNATURE----- Merge tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Fix accounting of persistent ring buffer rewind On boot up, the head page is moved back to the earliest point of the saved ring buffer. This is because the ring buffer being read by user space on a crash may not save the part it read. Rewinding the head page back to the earliest saved position helps keep those events from being lost. The number of events is also read during boot up and displayed in the stats file in the tracefs directory. It's also used for other accounting as well. On boot up, the "reader page" is accounted for but a rewind may put it back into the buffer and then the reader page may be accounted for again. Save off the original reader page and skip accounting it when scanning the pages in the ring buffer. * tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Do not double count the reader_page	2026-04-24 15:17:23 -07:00
Masami Hiramatsu (Google)	92d5a60672	ring-buffer: Do not double count the reader_page Since the cpu_buffer->reader_page is updated if there are unwound pages. After that update, we should skip the page if it is the original reader_page, because the original reader_page is already checked. Cc: stable@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177701353063.2223789.1471163147644103306.stgit@mhiramat.tok.corp.google.com Fixes: `ca296d32ec` ("tracing: ring_buffer: Rewind persistent ring buffer on reboot") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-24 15:34:39 -04:00
Linus Torvalds	892c894b4b	Misc locking fixes: - Fix ww_mutex regression, which caused hangs/pauses in some DRM drivers - Fix rtmutex proxy-rollback bug Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnrdJERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jnJw//bi0bzZnBYeU4ukrqnyVWUZS7lKCnl+3M Ys31Ls38nXTQPhs+qbgUlqLzybJ/96s7uattPIW8+0Jkr+PfQguCvT+QiXs0SFzC aEp6PcZAD1z+QQY1zj2c1TM414IdyWbkShRxsPULb1j8nYJ6mO50UzP4zFdl7QwH gwZRgPkriUhD3vfgtHX3U0rshrvFHu9Ed6uuMkLFxaEiHn0h/4rZa0q9vonxw+cU 64MabrxtPyQA97Mjr/YVsJjodt5bzkBmKNDCu9YxfqFQ0LTXyi1DZG/zdFHmT1Kf WcRIXbCx27HcR+NOWRze4rby2SR27NnPAZC/rR7Y6MX0IHGg1m2tM9DIGnDw71Su 3HkQ1W3KhQFO0uvbhkKCXk1SAfBrJwh0g3p91sgj/tZyaXXdEYRChhpie5jMy3eg 3k49QfYHigQZCZa8IH0PjrxLzDHo8S2tB75X6dWlxC/UeN/6X1G+Z9a52G9kZrFJ ZhrxY5/WVF4LbeIYupFbjN393jVN4WPEJKN3Esouuc+bluqMjUjl65v12yTQZeFy 7pObrM9nJ4FijWjt+OCHUUZLwt+k5zttdUjgEh0zonx94JtpDA0qjkpPtbYgCFak WQxwD+qs1r8yv+msqkQ65/ir5GT6GyAzQnQAQbAQyB4itC9iyDhoS6mzkFBsw3bk TxfGszSV5ng= =6wf3 -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2026-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Fix ww_mutex regression, which caused hangs/pauses in some DRM drivers - Fix rtmutex proxy-rollback bug * tag 'locking-urgent-2026-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/mutex: Fix ww_mutex wait_list operations rtmutex: Use waiter::task instead of current in remove_waiter()	2026-04-24 10:14:29 -07:00
Peter Zijlstra	0adc92b910	locking/mutex: Fix ww_mutex wait_list operations Chaitanya, John and Mikhail reported commit `25500ba7e7` ("locking/mutex: Remove the list_head from struct mutex") wrecked ww_mutex. Specifically there were 2 issues: - __ww_waiter_prev() had the termination condition wrong; it would terminate when the previous entry was the first, which results in a truncated iteration: W3, W2, (no W1). - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() / __ww_mutex_add_waiter(); this inserts @waiter before @pos (which is what list_add_tail() does). But this should then also update lock->first_waiter. Much thanks to Prateek for spotting the __mutex_add_waiter() issue! Fixes: `25500ba7e7` ("locking/mutex: Remove the list_head from struct mutex") Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com Reported-by: John Stultz <jstultz@google.com> Closes: https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/r/CABXGCsO5fKq2nD9nO8yO1z50ZzgCPWqueNXHANjntaswoOh2Dg@mail.gmail.com Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Link: https://patch.msgid.link/20260422092335.GH3102924%40noisy.programming.kicks-ass.net	2026-04-23 10:05:49 +02:00
Linus Torvalds	1e18ed5727	ring-buffer fix for v7.1 - Make undefsyms_base.c into a real file The file undefsyms_base.c is used to catch any symbols used by a remote ring buffer that is made for use of a pKVM hypervisor. As it doesn't share the same text as the rest of the kernel, referencing any symbols within the kernel will make it fail to be built for the standalone hypervisor. A file was created by the Makefile that checked for any symbols that could cause issues. There's no reason to have this file created by the Makefile, just create it as a normal file instead. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaekbsxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qr0MAQCHMFK56uzmMvQJg5Tfhtb+grY7ryo4 j0rjLkNQBovmxwD/XylBDkhxr8IQ72jU04e+xuXzrqoT18U7XvQEVpPgkQQ= =dQi0 -----END PGP SIGNATURE----- Merge tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Make undefsyms_base.c into a real file The file undefsyms_base.c is used to catch any symbols used by a remote ring buffer that is made for use of a pKVM hypervisor. As it doesn't share the same text as the rest of the kernel, referencing any symbols within the kernel will make it fail to be built for the standalone hypervisor. A file was created by the Makefile that checked for any symbols that could cause issues. There's no reason to have this file created by the Makefile, just create it as a normal file instead. * tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Make undefsyms_base.c a first-class citizen	2026-04-22 14:47:52 -07:00
Linus Torvalds	38ee6e1fb6	kgdb patches for 7.1 Only a very small update for kgdb this cycle: a single patch from Kexin Sun that fixes some outdated comments. Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmno2OcACgkQfOMlXTn3 iKHVNw/+IXsSmpUTllkwlzmib+D7zYVEoyLOdoBRQyJDjjT9Qe811budkIs02sIk NUgMBnffInU73lCw2TbRpKag9+dl8ZglKFvM3UpElFzY5ePnUc/fHeFtsXdzVjSK PVzL3fCtN66R1MuMeCdMNjhtSVHNJCrclj0We39JRgli7kWK1zH6kY8X/j/ExBTR 9S5e3/9Yn/V9Z2Z33lwBMCKLHyd/5r0ezKbvfVGmqT/sTE/IbTcTfVMipEHGtHfA JHs5PC+1obJ9jeq/1c1fV8tZ8kO+VjGI0yUibbWEBmPx0pVUW/FG1z+WwhlbcyOl cyhKQ7d4h4e718M+bWAXeoRGk9bXyAWHqWojM98Xh+TvR2H20qAzUC44qeT9tt7B SOfvZM4Xwsn+uYqmthtMqLMqe8VIX2WbUVjsntYZ46fRTq8RAUY3z7yLv6bm7N4l p4i7/cdCsbzi+tN6uaKANywvt9ygEAcPILGfMhfIS4Pa7aQ/JVjKgxKhP1bUuH9u ftqIcJwvX3d8MDQY4FsLpsRToj3rADfr35HnQCc7fmqRIFi7++fXzPerktdF/5Sd 6NMR0+mdFewdW9JfYKGZMTybf9XXyC+F6HlKgNOBddPje7qQlLUIQBajn71W753i TNKI5Go6idMgWEukNVXGD2XqObyKD2eBgCj/42Kc8bDfo7D66UA= =MegD -----END PGP SIGNATURE----- Merge tag 'kgdb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb update from Daniel Thompson: "Only a very small update for kgdb this cycle: a single patch from Kexin Sun that fixes some outdated comments" * tag 'kgdb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: update outdated references to kgdb_wait()	2026-04-22 14:26:58 -07:00
Paolo Bonzini	5335e318ad	tracing: Make undefsyms_base.c a first-class citizen Linus points out that dumping undefsyms_base.c form the Makefile is rather ugly, and that a much better course of action would be to have this file as a first-class citizen in the git tree. This allows some extra cleanup in the Makefile, and the removal of the .gitignore file in kernel/trace. Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/CAHk-=wieqGd_XKpu8UxDoyADZx8TDe8CF3RmkUXt5N_9t5Pf_w@mail.gmail.com Link: https://lore.kernel.org/all/20260421095446.2951646-1-maz@kernel.org/ Link: https://patch.msgid.link/20260421100455.324333-1-pbonzini@redhat.com Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-22 11:24:41 -04:00
Masami Hiramatsu (Google)	476c5bbae6	tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading Fix fprobe to unregister ftrace_ops if corresponding type of fprobe does not exist on the fprobe_ip_table and it is expected to be empty when unloading modules. Since ftrace thinks that the empty hash means everything to be traced, if we set fprobes only on the unloaded module, all functions are traced unexpectedly after unloading module. e.g. # modprobe xt_LOG.ko # echo 'f:test log_tg*' > dynamic_events # echo 1 > events/fprobes/test/enable # cat enabled_functions log_tg [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_check [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_destroy [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 # rmmod xt_LOG # wc -l enabled_functions 34085 enabled_functions Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 09:24:13 +09:00
Kexin Sun	256e5254ef	kgdb: update outdated references to kgdb_wait() The function kgdb_wait() was folded into the static function kgdb_cpu_enter() by commit `62fae31219` ("kgdb: eliminate kgdb_wait(), all cpus enter the same way"). Update the four stale references accordingly: - include/linux/kgdb.h and arch/x86/kernel/kgdb.c: the kgdb_roundup_cpus() kdoc describes what other CPUs are rounded up to call. Because kgdb_cpu_enter() is static, the correct public entry point is kgdb_handle_exception(); also fix a pre-existing grammar error ("get them be" -> "get them into") and reflow the text. - kernel/debug/debug_core.c: replace with the generic description "the debug trap handler", since the actual entry path is architecture-specific. - kernel/debug/gdbstub.c: kgdb_cpu_enter() is correct here (it describes internal state, not a call target); add the missing parentheses. Suggested-by: Daniel Thompson <daniel@riscstar.com> Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>	2026-04-21 16:41:54 +01:00
Masami Hiramatsu (Google)	0ac0058a74	tracing/fprobe: Check the same type fprobe on table as the unregistered one Commit `2c67dc457b` ("tracing: fprobe: optimization for entry only case") introduced a different ftrace_ops for entry-only fprobes. However, when unregistering an fprobe, the kernel only checks if another fprobe exists at the same address, without checking which type of fprobe it is. If different fprobes are registered at the same address, the same address will be registered in both fgraph_ops and ftrace_ops, but only one of them will be deleted when unregistering. (the one removed first will not be deleted from the ops). This results in junk entries remaining in either fgraph_ops or ftrace_ops. For example: ======= cd /sys/kernel/tracing # 'Add entry and exit events on the same place' echo 'f:event1 vfs_read' >> dynamic_events echo 'f:event2 vfs_read%return' >> dynamic_events # 'Enable both of them' echo 1 > events/fprobes/enable cat enabled_functions vfs_read (2) ->arch_ftrace_ops_list_func+0x0/0x210 # 'Disable and remove exit event' echo 0 > events/fprobes/event2/enable echo -:event2 >> dynamic_events # 'Disable and remove all events' echo 0 > events/fprobes/enable echo > dynamic_events # 'Add another event' echo 'f:event3 vfs_open%return' > dynamic_events cat dynamic_events f:fprobes/event3 vfs_open%return echo 1 > events/fprobes/enable cat enabled_functions vfs_open (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150} vfs_read (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150} ======= As you can see, an entry for the vfs_read remains. To fix this issue, when unregistering, the kernel should also check if there is the same type of fprobes still exist at the same address, and if not, delete its entry from either fgraph_ops or ftrace_ops. Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/ Fixes: `2c67dc457b` ("tracing: fprobe: optimization for entry only case") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 00:03:10 +09:00
Masami Hiramatsu (Google)	aa72812b49	tracing/fprobe: Avoid kcalloc() in rcu_read_lock section fprobe_remove_node_in_module() is called under RCU read locked, but this invokes kcalloc() if there are more than 8 fprobes installed on the module. Sashiko warns it because kcalloc() can sleep [1]. [1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com To fix this issue, expand the batch size to 128 and do not expand the fprobe_addr_list, but just cancel walking on fprobe_ip_table, update fgraph/ftrace_ops and retry the loop again. Link: https://lore.kernel.org/all/177669367206.132053.1493637946869032744.stgit@mhiramat.tok.corp.google.com/ Fixes: `0de4c70d04` ("tracing: fprobe: use rhltable for fprobe_ip_table") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 00:02:59 +09:00
Masami Hiramatsu (Google)	845947aca6	tracing/fprobe: Remove fprobe from hash in failure path When register_fprobe_ips() fails, it tries to remove a list of fprobe_hash_node from fprobe_ip_table, but it missed to remove fprobe itself from fprobe_table. Moreover, when removing the fprobe_hash_node which is added to rhltable once, it must use kfree_rcu() after removing from rhltable. To fix these issues, this reuses unregister_fprobe() internal code to rollback the half-way registered fprobe. Link: https://lore.kernel.org/all/177669366417.132053.17874946321744910456.stgit@mhiramat.tok.corp.google.com/ Fixes: `4346ba1604` ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-21 23:59:57 +09:00
Masami Hiramatsu (Google)	1aec9e5c3e	tracing/fprobe: Unregister fprobe even if memory allocation fails unregister_fprobe() can fail under memory pressure because of memory allocation failure, but this maybe called from module unloading, and usually there is no way to retry it. Moreover. trace_fprobe does not check the return value. To fix this problem, unregister fprobe and fprobe_hash_node even if working memory allocation fails. Anyway, if the last fprobe is removed, the filter will be freed. Link: https://lore.kernel.org/all/177669365629.132053.8433032896213721288.stgit@mhiramat.tok.corp.google.com/ Fixes: `4346ba1604` ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-21 23:59:39 +09:00
Masami Hiramatsu (Google)	6ad51ada17	tracing/fprobe: Reject registration of a registered fprobe before init Reject registration of a registered fprobe which is on the fprobe hash table before initializing fprobe. The add_fprobe_hash() checks this re-register fprobe, but since fprobe_init() clears hlist_array field, it is too late to check it. It has to check the re-registration before touncing fprobe. Link: https://lore.kernel.org/all/177669364845.132053.18375367916162315835.stgit@mhiramat.tok.corp.google.com/ Fixes: `4346ba1604` ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-21 23:59:29 +09:00
Linus Torvalds	b4e07588e7	tracing: tell git to ignore the generated 'undefsyms_base.c' file This odd file was added to automatically figure out tool-generated symbols. Honestly, it should have been just a real honest-to-goodness regular file in git, instead of having strange code to generate it in the Makefile, but that is not how that silly thing works. So now we need to ignore it explicitly. Fixes: `1211907ac0` ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-04-20 17:25:56 -07:00
Linus Torvalds	b66cb4f156	printk changes for 7.1 -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmnmEggbFIAAAAAABAAO bWFudTIsMi41KzEuMTIsMiwyAAoJEFKgDEdIgJTyjQwP/j2a9EzdI4+DfGrmA56m /heo44ObpFJppYaWEyAN7xX8Xpm7ErokLjZxVxhgQY7hGU5WLx1CmslnJbfWYkWy r7q92Kd/QIDmsvwHzlE0xdaX3rp8AXp3O2iIhcDfv9FRe+UultrESBpw9JbRYXXQ SLAFU1iOFMxyuzvhW/2l/B+PQDm0uoRMpMWTsLXo2JtNOsIGewNyV/7dhOumm+RD /0lgVA9jMAQA33j4Hkr38REe8lYH7aGl66x1mZhDg+kYb7w7ndKW/QN21OJiP+2D s4moi2/VLmC/UcxoDAOQTArKYkrYc2nzLMJRMaIun8jcNfCHEqaQAW2MTzSDAC78 +atdlWrfIxykORA31lTtjh4o5vg43sQPjFZCJr3RxNLfxFy15NULhT75QFrxe9sA W3gs4Rz+LeynoQbJNCn8hgK0xcpKLtzC4dMY7dfQo6uyq2YnJauEvioJKbUYyoDh 3l3uYfnzHcfRUb+yNtIkNCiYXwrpTAnSnifCbWO3smWYxhCdAT14Rval/fxtTjSe sTphd7m32U56WpWX0lYQbY001nMzso+gN/eQSK20IzrhvghwYUqvCKNm+fIM9tjR nQBxka+B7XhO27hNV4QIT8ZQRUkabQQAseEFhLQI2Ptgw3eV0s85gMP60Lg4gpZ0 7OGA/VM+xrLqXvsFeDDWccSg =O6QR -----END PGP SIGNATURE----- Merge tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Fix printk ring buffer initialization and sanity checks - Workaround printf kunit test compilation with gcc < 12.1 - Add IPv6 address printf format tests - Misc code and documentation cleanup * tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING lib/vsprintf: use bool for local decode variable lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug() printk: ringbuffer: fix errors in comments printk_ringbuffer: Add sanity check for 0-size data printk_ringbuffer: Fix get_data() size sanity check printf: add IPv6 address format tests printk: Fix _DESCS_COUNT type for 64-bit systems	2026-04-20 15:42:18 -07:00
Linus Torvalds	ccbc9fdb32	Fix timer stalls caused by incorrect handling of the dev->next_event_forced flag. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnl1/4RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1g/ww/7BI4CyQUJLSCpYMjvkj+87Rrfd5u6FGqt jz0dGeQpY0LvRGSqASwICe+1r0zwHF+xDsUfJvA13mRaPM6D6bEU+JE6ffK8B6T9 EIyYwEwQ2a7DrdIu9+FCXTwqXDUoGLFsguD50b4qupQKFcDlwgZbg4UAWi/ptau9 Ww+5T/+sfw/SMR9EwXBSKH79N0gOOGpDNfGtpDv+0X0qPQvo9QGAxMfIUgMf7ZaA y55agXi5iOdM+mAIrE69WLinBzrBvXHWNr66/SaMadQs93I1hU54sLpir4ft1yCs WnDtTRWG11Y0HBHUqqgbnN8BR/2VIFDVe9BtRDoDUD70iEJ8TGqJjOvF8v7C00MK ets0zNel9Rqbz9wjrjTekPYUHfC/t9qqzV77c0TdU1IR6FArf/OT9Ge34AVr60EX a5s4aX7ECLjwuTwgQPLXsSedOD0eQndf/VYdEQ86fTUfyyujVg2NCxbFEfDr3eho SbjcNv1UQ1WY/7miJzYaiA2aVNtwuX25YNI+t3f3pX/1tGqmx9oB1tNzJqgGfuN9 3/Rx3uP0kH+gpbw1lKAugFJazOEHDLJG8LBgF0PYbmlVdIGn57IuBlQL5GDLe77O G+sFUhrLNpkrIJVhNWODkM+K/z9vvKzENiRgG4hB0oAbQEUuqTA6ppayKWWs8KDb 9fdgLSkdjtI= =GPvm -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix timer stalls caused by incorrect handling of the dev->next_event_forced flag" * tag 'timers-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Add missing resets of the next_event_forced flag	2026-04-20 15:30:08 -07:00
Keenan Dong	3bfdc63936	rtmutex: Use waiter::task instead of current in remove_waiter() remove_waiter() is used by the slowlock paths, but it is also used for proxy-lock rollback in rt_mutex_start_proxy_lock() when invoked from futex_requeue(). In the latter case waiter::task is not current, but remove_waiter() operates on current for the dequeue operation. That results in several problems: 1) the rbtree dequeue happens without waiter::task::pi_lock being held 2) the waiter task's pi_blocked_on state is not cleared, which leaves a dangling pointer primed for UAF around. 3) rt_mutex_adjust_prio_chain() operates on the wrong top priority waiter task Use waiter::task instead of current in all related operations in remove_waiter() to cure those problems. [ tglx: Fixup rt_mutex_adjust_prio_chain(), add a comment and amend the changelog ] Fixes: `8161239a8b` ("rtmutex: Simplify PI algorithm and make highest prio task get lock") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org	2026-04-21 00:22:31 +02:00
Petr Mladek	add9d911be	Merge branch 'rework/prb-fixes' into for-linus	2026-04-20 13:42:01 +02:00
Linus Torvalds	40735a683b	mm.git review status for linus..mm-stable Everything: Total patches: 121 Reviews/patch: 2.11 Reviewed rate: 90% Excluding DAMON: Total patches: 113 Reviews/patch: 2.25 Reviewed rate: 96% - The 33 patch series "Eliminate Dying Memory Cgroup" from Qi Zheng and Muchun Song addresses the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory. The [0/N] changelog has a good overview of this work. - The 3 patch series "fix unexpected type conversions and potential overflows" from Qi Zheng fixes a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series. - The 6 patch series "kho: history: track previous kernel version and kexec boot count" from Breno Leitao uses Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and prints it at boot time. - The 4 patch series "liveupdate: prevent double preservation" from Pasha Tatashin teaches LUO to avoid managing the same file across different active sessions. - The 10 patch series "liveupdate: Fix module unloading and unregister API" from Pasha Tatashin addresses an issue with how LUO handles module reference counting and unregistration during module unloading. - The 2 patch series "zswap pool per-CPU acomp_ctx simplifications" from Kanchana Sridhar simplifies and cleans up the zswap crypto compression handling and improves the lifecycle management of zswap pool's per-CPU acomp_ctx resources. - The 2 patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" from SeongJae Park addresses unlikely but possible leaks and deadlocks in damon_call() and damon_walk(). - The 2 patch series "mm/damon/core: validate damos_quota_goal->nid" from SeongJae Park fixes a couple of root-only wild pointer dereferences. - The 2 patch series "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" from SeongJae Park updates the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time. - The 3 patch series "Minor hmm_test fixes and cleanups" from Alistair Popple implements two bugfixes a cleanup for the HMM kernel selftests. - The 6 patch series "Modify memfd_luo code" from Chenghao Duan provides cleanups, simplifications and speedups in the memfd_lou code. - The 4 patch series "mm, kvm: allow uffd support in guest_memfd" from Mike Rapoport enables support for userfaultfd in guest_memfd. - The 6 patch series "selftests/mm: skip several tests when thp is not available" from Chunyu Hu fixes several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels. - The 2 patch series "mm/mprotect: micro-optimization work" from Pedro Falcato implements a couple of nice speedups for mprotect(). - The 3 patch series "MAINTAINERS: update KHO and LIVE UPDATE entries" from Pratyush Yadav reflects upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaeNL/wAKCRDdBJ7gKXxA jt7EAQCEEQvYYTjld+8HJKsCbavY4pEfci7z4SBiQyIPjRracQD/ZfjXnzL7ucc1 b6q6G4TcslvIDBgzVkk9G2BVn2oCoAg= =3ozv -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...	2026-04-19 08:01:17 -07:00
Linus Torvalds	9055c64567	memblock: updates for 7.0-rc1 * improve debugability of reserve_mem kernel parameter handling with print outs in case of a failure and debugfs info showing what was actually reserved * Make memblock_free_late() and free_reserved_area() use the same core logic for freeing the memory to buddy and ensure it takes care of updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmnjRmsQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kYh0CAC4NpZGFqpEBep1eQcfqsPH05dvp1LUXDNk i5GwS2ht/F5D9GcD+EyoYRQjRM8k+XZyOe3sqEF01Uav/rHAv3XrITg/pfiA92AR K7CvQv4NvyQqUNcv/mEb+P8niriJ4oHRXCag9inop1jo/x3Mym07oEy73rknAx9r ZQKwoFNOM/QQGVb9hZUANKCkE8cAsUXG89yEOH0n17FOahC0PZbK/vxjeO+br3IL HxEoC5l1j4cUauf8XEhsVXXdch0iqit/fB3ROePYFNCx7koVYHk6Yl1w++AM0RUA ypOmfPsSiqLY2ciuTIAnpTeMfQkkhEmMI3mp6T5BUBwSKJxLRaSM =c1xd -----END PGP SIGNATURE----- Merge tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - improve debuggability of reserve_mem kernel parameter handling with print outs in case of a failure and debugfs info showing what was actually reserved - Make memblock_free_late() and free_reserved_area() use the same core logic for freeing the memory to buddy and ensure it takes care of updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled. * tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/alternative: delay freeing of smp_locks section memblock: warn when freeing reserved memory before memory map is initialized memblock, treewide: make memblock_free() handle late freeing memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y memblock: extract page freeing from free_reserved_area() into a helper memblock: make free_reserved_area() more robust mm: move free_reserved_area() to mm/memblock.c powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact() powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() memblock: move reserve_bootmem_range() to memblock.c and make it static memblock: Add reserve_mem debugfs info memblock: Print out errors on reserve_mem parser	2026-04-18 11:29:14 -07:00
Pasha Tatashin	68750e820b	liveupdate: defer file handler module refcounting to active sessions Stop pinning modules indefinitely upon file handler registration. Instead, dynamically increment the module reference count only when a live update session actively uses the file handler (e.g., during preservation or deserialization), and release it when the session ends. This allows modules providing live update handlers to be gracefully unloaded when no live update is in progress. Link: https://lore.kernel.org/20260327033335.696621-11-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	2ab7207e7e	liveupdate: make unregister functions return void Change liveupdate_unregister_file_handler and liveupdate_unregister_flb to return void instead of an error code. This follows the design principle that unregistration during module unload should not fail, as the unload cannot be stopped at that point. Link: https://lore.kernel.org/20260327033335.696621-10-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	074488008d	liveupdate: remove liveupdate_test_unregister() Now that file handler unregistration automatically unregisters all associated file handlers (FLBs), the liveupdate_test_unregister() function is no longer needed. Remove it along with its usages and declarations. Link: https://lore.kernel.org/20260327033335.696621-9-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	5ee1c7d641	liveupdate: auto unregister FLBs on file handler unregistration To ensure that unregistration is always successful and doesn't leave dangling resources, introduce auto-unregistration of FLBs: when a file handler is unregistered, all FLBs associated with it are automatically unregistered. Introduce a new helper luo_flb_unregister_all() which unregisters all FLBs linked to the given file handler. Link: https://lore.kernel.org/20260327033335.696621-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	118c390824	liveupdate: remove luo_session_quiesce() Now that FLB module references are handled dynamically during active sessions, we can safely remove the luo_session_quiesce() and luo_session_resume() mechanism. Link: https://lore.kernel.org/20260327033335.696621-7-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	76be9983df	liveupdate: defer FLB module refcounting to active sessions Stop pinning modules indefinitely upon FLB registration. Instead, dynamically take a module reference when the FLB is actively used in a session (e.g., during preserve and retrieve) and release it when the session concludes. This allows modules providing FLB operations to be cleanly unloaded when not in active use by the live update orchestrator. Link: https://lore.kernel.org/20260327033335.696621-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:50 -07:00
Pasha Tatashin	6b2b22f7c8	liveupdate: protect FLB lists with luo_register_rwlock Because liveupdate FLB objects will soon drop their persistent module references when registered, list traversals must be protected against concurrent module unloading. To provide this protection, utilize the global luo_register_rwlock. It protects the global registry of FLBs and the handler's specific list of FLB dependencies. Read locks are used during concurrent list traversals (e.g., during preservation and serialization). Write locks are taken during registration and unregistration. Link: https://lore.kernel.org/20260327033335.696621-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:49 -07:00
Pasha Tatashin	9e1e185845	liveupdate: protect file handler list with rwsem Because liveupdate file handlers will no longer hold a module reference when registered, we must ensure that the access to the handler list is protected against concurrent module unloading. Utilize the global luo_register_rwlock to protect the global registry of file handlers. Read locks are taken during list traversals in luo_preserve_file() and luo_file_deserialize(). Write locks are taken during registration and unregistration. Link: https://lore.kernel.org/20260327033335.696621-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:49 -07:00
Pasha Tatashin	38fb71ace2	liveupdate: synchronize lazy initialization of FLB private state The luo_flb_get_private() function, which is responsible for lazily initializing the private state of FLB objects, can be called concurrently from multiple threads. This creates a data race on the 'initialized' flag and can lead to multiple executions of mutex_init() and INIT_LIST_HEAD() on the same memory. Introduce a static spinlock (luo_flb_init_lock) local to the function to synchronize the initialization path. Use smp_load_acquire() and smp_store_release() for memory ordering between the fast path and the slow path. Link: https://lore.kernel.org/20260327033335.696621-3-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:49 -07:00
Pasha Tatashin	277f4e5e39	liveupdate: safely print untrusted strings Patch series "liveupdate: Fix module unloading and unregister API", v3. This patch series addresses an issue with how LUO handles module reference counting and unregistration during a module unload (e.g., via rmmod). Currently, modules that register live update file handlers are pinned for the entire duration they are registered. This prevents the modules from being unloaded gracefully, even when no live update session is in progress. Furthermore, if a module is forcefully unloaded, the unregistration functions return an error (e.g. -EBUSY) if a session is active, which is ignored by the kernel's module unload path, leaving dangling pointers in the LUO global lists. To resolve these issues, this series introduces the following changes: 1. Adds a global read-write semaphore (luo_register_rwlock) to protect the registration lists for both file handlers and FLBs. 2. Reduces the scope of module reference counting for file handlers and FLBs. Instead of pinning modules indefinitely upon registration, references are now taken only when they are actively used in a live update session (e.g., during preservation, retrieval, or deserialization). 3. Removes the global luo_session_quiesce() mechanism since module unload behavior now handles active sessions implicitly. 4. Introduces auto-unregistration of FLBs during file handler unregistration to prevent leaving dangling resources. 5. Changes the unregistration functions to return void instead of an error code. 6. Fixes a data race in luo_flb_get_private() by introducing a spinlock for thread-safe lazy initialization. 7. Strengthens security by using %.s when printing untrusted deserialized compatible strings and session names to prevent out-of-bounds reads. This patch (of 10): Deserialized strings from KHO data (such as file handler compatible strings and session names) are provided by the previous kernel and might not be null-terminated if the data is corrupted or maliciously crafted. When printing these strings in error messages, use the %.s format specifier with the maximum buffer size to prevent out-of-bounds reads into adjacent kernel memory. Link: https://lore.kernel.org/20260327033335.696621-1-pasha.tatashin@soleen.com Link: https://lore.kernel.org/20260327033335.696621-2-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:49 -07:00
Pasha Tatashin	00d0b37237	liveupdate: prevent double management of files Patch series "liveupdate: prevent double preservation", v4. Currently, LUO does not prevent the same file from being managed twice across different active sessions. Because LUO preserves files of absolutely different types: memfd, and upcoming vfiofd [1], iommufd [2], guestmefd (and possible kvmfd/cpufd). There is no common private data or guarantee on how to prevent that the same file is not preserved twice beside using inode or some slower and expensive method like hashtables. This patch (of 4) Currently, LUO does not prevent the same file from being managed twice across different active sessions. Use a global xarray luo_preserved_files to keep track of file identifiers being preserved by LUO. Update luo_preserve_file() to check and insert the file identifier into this xarray when it is preserved, and erase it in luo_file_unpreserve_files() when it is released. To allow handlers to define what constitutes a "unique" file (e.g., different struct file objects pointing to the same hardware resource), add a get_id() callback to struct liveupdate_file_ops. If not provided, the default identifier is the struct file pointer itself. This ensures that the same file (or resource) cannot be managed by multiple sessions. If another session attempts to preserve an already managed file, it will now fail with -EBUSY. Link: https://lore.kernel.org/20260326163943.574070-1-pasha.tatashin@soleen.com Link: https://lore.kernel.org/20260326163943.574070-2-pasha.tatashin@soleen.com Link: https://lore.kernel.org/all/20260129212510.967611-1-dmatlack@google.com [1] Link: https://lore.kernel.org/all/20260203220948.2176157-1-skhawaja@google.com [2] Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:49 -07:00
Breno Leitao	76aa46b9e4	kho: kexec-metadata: track previous kernel chain Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time. Example output: [ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1) Motivation ========== Bugs that only reproduce when kexecing from specific kernel versions are difficult to diagnose. These issues occur when a buggy kernel kexecs into a new kernel, with the bug manifesting only in the second kernel. Recent examples include the following commits: * commit `eb22663125` ("x86/boot: Fix page table access in 5-level to 4-level paging transition") * commit `77d48d39e9` ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption") * commit `64b45dd46e` ("x86/efi: skip memattr table on kexec boot") As kexec-based reboots become more common, these version-dependent bugs are appearing more frequently. At scale, correlating crashes to the previous kernel version is challenging, especially when issues only occur in specific transition scenarios. Implementation ============== The kexec metadata is stored as a plain C struct (struct kho_kexec_metadata) rather than FDT format, for simplicity and direct field access. It is registered via kho_add_subtree() as a separate subtree, keeping it independent from the core KHO ABI. This design choice: - Keeps the core KHO ABI minimal and stable - Allows the metadata format to evolve independently - Avoids requiring version bumps for all KHO consumers (LUO, etc.) when the metadata format changes The struct kho_kexec_metadata contains two fields: - previous_release: The kernel version that initiated the kexec - kexec_count: Number of kexec boots since last cold boot On cold boot, kexec_count starts at 0 and increments with each kexec. The count helps identify issues that only manifest after multiple consecutive kexec reboots. [leitao@debian.org: call kho_kexec_metadata_init() for both boot paths] Link: https://lore.kernel.org/all/20260309-kho-v8-5-c3abcf4ac750@debian.org/ [1] Link: https://lore.kernel.org/20260409-kho_fix_merge_issue-v1-1-710c84ceaa85@debian.org Link: https://lore.kernel.org/20260316-kho-v9-5-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:48 -07:00
Breno Leitao	062dd306d9	kho: fix kho_in_debugfs_init() to handle non-FDT blobs kho_in_debugfs_init() calls fdt_totalsize() to determine blob sizes, which assumes all blobs are FDTs. This breaks for non-FDT blobs like struct kho_kexec_metadata. Fix this by reading the "blob-size" property from the FDT (persisted by kho_add_subtree()) instead of calling fdt_totalsize(). Also rename local variables from fdt_phys/sub_fdt to blob_phys/blob for consistency with the non-FDT-specific naming. Link: https://lore.kernel.org/20260316-kho-v9-4-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:48 -07:00
Breno Leitao	85e4139282	kho: persist blob size in KHO FDT kho_add_subtree() accepts a size parameter but only forwards it to debugfs. The size is not persisted in the KHO FDT, so it is lost across kexec. This makes it impossible for the incoming kernel to determine the blob size without understanding the blob format. Store the blob size as a "blob-size" property in the KHO FDT alongside the "preserved-data" physical address. This allows the receiving kernel to recover the size for any blob regardless of format. Also extend kho_retrieve_subtree() with an optional size output parameter so callers can learn the blob size without needing to understand the blob format. Update all callers to pass NULL for the new parameter. Link: https://lore.kernel.org/20260316-kho-v9-3-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:48 -07:00
Breno Leitao	4916ae3867	kho: rename fdt parameter to blob in kho_add/remove_subtree() Since kho_add_subtree() now accepts arbitrary data blobs (not just FDTs), rename the parameter from 'fdt' to 'blob' to better reflect its purpose. Apply the same rename to kho_remove_subtree() for consistency. Also rename kho_debugfs_fdt_add() and kho_debugfs_fdt_remove() to kho_debugfs_blob_add() and kho_debugfs_blob_remove() respectively, with the same parameter rename from 'fdt' to 'blob'. Link: https://lore.kernel.org/20260316-kho-v9-2-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:48 -07:00
Breno Leitao	d9e4142e76	kho: add size parameter to kho_add_subtree() Patch series "kho: history: track previous kernel version and kexec boot count", v9. Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time. Example ======= [ 0.000000] Linux version 6.19.0-rc3-upstream-00047-ge5d992347849 ... [ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107upstream-00004-g3071b0dc4498 (count 1) Motivation ========== Bugs that only reproduce when kexecing from specific kernel versions are difficult to diagnose. These issues occur when a buggy kernel kexecs into a new kernel, with the bug manifesting only in the second kernel. Recent examples include: * `eb22663125` ("x86/boot: Fix page table access in 5-level to 4-level paging transition") * `77d48d39e9` ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption") * `64b45dd46e` ("x86/efi: skip memattr table on kexec boot") As kexec-based reboots become more common, these version-dependent bugs are appearing more frequently. At scale, correlating crashes to the previous kernel version is challenging, especially when issues only occur in specific transition scenarios. Some bugs manifest only after multiple consecutive kexec reboots. Tracking the kexec count helps identify these cases (this metric is already used by live update sub-system). KHO provides a reliable mechanism to pass information between kernels. By carrying the previous kernel's release string and kexec count forward, we can print this context at boot time to aid debugging. The goal of this feature is to have this information being printed in early boot, so, users can trace back kernel releases in kexec. Systemd is not helpful because we cannot assume that the previous kernel has systemd or even write access to the disk (common when using Linux as bootloaders) This patch (of 6): kho_add_subtree() assumes the fdt argument is always an FDT and calls fdt_totalsize() on it in the debugfs code path. This assumption will break if a caller passes arbitrary data instead of an FDT. When CONFIG_KEXEC_HANDOVER_DEBUGFS is enabled, kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add(), which executes: f->wrapper.size = fdt_totalsize(fdt); Fix this by adding an explicit size parameter to kho_add_subtree() so callers specify the blob size. This allows subtrees to contain arbitrary data formats, not just FDTs. Update all callers: - memblock.c: use fdt_totalsize(fdt) - luo_core.c: use fdt_totalsize(fdt_out) - test_kho.c: use fdt_totalsize() - kexec_handover.c (root fdt): use fdt_totalsize(kho_out.fdt) Also update __kho_debugfs_fdt_add() to receive the size explicitly instead of computing it internally via fdt_totalsize(). In kho_in_debugfs_init(), pass fdt_totalsize() for the root FDT and sub-blobs since all current users are FDTs. A subsequent patch will persist the size in the KHO FDT so the incoming side can handle non-FDT blobs correctly. Link: https://lore.kernel.org/20260323110747.193569-1-duanchenghao@kylinos.cn Link: https://lore.kernel.org/20260316-kho-v9-1-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:48 -07:00
Qi Zheng	8285917d6f	mm: memcontrol: prepare for reparenting non-hierarchical stats To resolve the dying memcg issue, we need to reparent LRU folios of child memcg to its parent memcg. This could cause problems for non-hierarchical stats. As Yosry Ahmed pointed out: In short, if memory is charged to a dying cgroup at the time of reparenting, when the memory gets uncharged the stats updates will occur at the parent. This will update both hierarchical and non-hierarchical stats of the parent, which would corrupt the parent's non-hierarchical stats (because those counters were never incremented when the memory was charged). Now we have the following two types of non-hierarchical stats, and they are only used in CONFIG_MEMCG_V1: a. memcg->vmstats->state_local[i] b. pn->lruvec_stats->state_local[i] To ensure that these non-hierarchical stats work properly, we need to reparent these non-hierarchical stats after reparenting LRU folios. To this end, this commit makes the following preparations: 1. implement reparent_state_local() to reparent non-hierarchical stats 2. make css_killed_work_fn() to be called in rcu work, and implement get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race between mod_memcg_state()/mod_memcg_lruvec_state() and reparent_state_local() Link: https://lore.kernel.org/e862995c45a7101a541284b6ebee5e5c32c89066.1772711148.git.zhengqi.arch@bytedance.com Co-developed-by: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-18 00:10:47 -07:00
Linus Torvalds	eb0d6d97c2	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmnihOkACgkQ6rmadz2v bTqjQA/+K6R/teQRwVmP1GDrfBjz2TXUzCN1WQQLzbnJNR96Mzq72+aTWjza89BK yEUP379qiOeUfEyyV7DNfHh8hAclUAMKuvI3T3pshLQhpOS0+YcpfbakEZbos+My AzEGhGl2nhT7S5twHFznCpuSaLgqldHkdAy4BZIiFkOS5lPBX9CU++OAslFPM+f8 R28JQYWuv2/b1mRsz8zDmQQXxwH/Rpz9hdJKcpm/kCYYBay3cAFV7ArFJfn+Y5se 9I6mTwNQ+xtSxtsmR/lftlGo1Vv9ah6qM9gKwgju0SkNrS+9UBlNUSmTrJk1fz+d SxdppCrqxwHY3UVd62eF4fWWgusC+oMuKzTh6d+D/ZkKvnEjdAx5XQ7uUQyYhKil G12vvKWcHit0Qz9RAhqlEEZ+GIpFTtLql6aW7pRmQKE8/vmQwAD1HBqNqWYKjokW btlJ3fUOGu8VHtnYbI3FN6VsK8BU9t/xMny9Fys9X4KmtWBLsm4udmiorV9uC+w6 xV2s+x+ahythTEzVICB6BlQotSRyMd9kR5qisJsetWk+7NBY0Bwn7C0kfVGepHh0 WerFSYdSifTvBWQjXnvqmAX7YspmpZvevw8PCtoPq1xq5d1FrYu1K5GX/xzpy+pH p13afkbN7Mk6OwteFefD1B0ofug3V9sx3HBI72ENs1Z+hh1KdOQ= =79I2 -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: "Most of the diff stat comes from Xu Kuohai's fix to emit ENDBR/BTI, since all JITs had to be touched to move constant blinding out and pass bpf_verifier_env in. - Fix use-after-free in arena_vm_close on fork (Alexei Starovoitov) - Dissociate struct_ops program with map if map_update fails (Amery Hung) - Fix out-of-range and off-by-one bugs in arm64 JIT (Daniel Borkmann) - Fix precedence bug in convert_bpf_ld_abs alignment check (Daniel Borkmann) - Fix arg tracking for imprecise/multi-offset in BPF_ST/STX insns (Eduard Zingerman) - Copy token from main to subprogs to fix missing kallsyms (Eduard Zingerman) - Prevent double close and leak of btf objects in libbpf (Jiri Olsa) - Fix af_unix null-ptr-deref in sockmap (Michal Luczaj) - Fix NULL deref in map_kptr_match_type for scalar regs (Mykyta Yatsenko) - Avoid unnecessary IPIs. Remove redundant bpf_flush_icache() in arm64 and riscv JITs (Puranjay Mohan) - Fix out of bounds access. Validate node_id in arena_alloc_pages() (Puranjay Mohan) - Reject BPF-to-BPF calls and callbacks in arm32 JIT (Puranjay Mohan) - Refactor all JITs to pass bpf_verifier_env to emit ENDBR/BTI for indirect jump targets on x86-64, arm64 JITs (Xu Kuohai) - Allow UTF-8 literals in bpf_bprintf_prepare() (Yihan Ding)" * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (32 commits) bpf, arm32: Reject BPF-to-BPF calls and callbacks in the JIT bpf: Dissociate struct_ops program with map if map_update fails bpf: Validate node_id in arena_alloc_pages() libbpf: Prevent double close and leak of btf objects selftests/bpf: cover UTF-8 trace_printk output bpf: allow UTF-8 literals in bpf_bprintf_prepare() selftests/bpf: Reject scalar store into kptr slot bpf: Fix NULL deref in map_kptr_match_type for scalar regs bpf: Fix precedence bug in convert_bpf_ld_abs alignment check bpf, arm64: Emit BTI for indirect jump target bpf, x86: Emit ENDBR for indirect jump targets bpf: Add helper to detect indirect jump targets bpf: Pass bpf_verifier_env to JIT bpf: Move constants blinding out of arch-specific JITs bpf, sockmap: Take state lock for af_unix iter bpf, sockmap: Fix af_unix null-ptr-deref in proto update selftests/bpf: Extend bpf_iter_unix to attempt deadlocking bpf, sockmap: Fix af_unix iter deadlock bpf, sockmap: Annotate af_unix sock:: Sk_state data-races selftests/bpf: verify kallsyms entries for token-loaded subprograms ...	2026-04-17 15:58:22 -07:00
Amery Hung	f75aeb2de8	bpf: Dissociate struct_ops program with map if map_update fails Currently, when bpf_struct_ops_map_update_elem() fails, the programs' st_ops_assoc will remain set. They may become dangling pointers if the map is freed later, but they will never be dereferenced since the struct_ops attachment did not succeed. However, if one of the programs is subsequently attached as part of another struct_ops map, its st_ops_assoc will be poisoned even though its old st_ops_assoc was stale from a failed attachment. Fix the spurious poisoned st_ops_assoc by dissociating struct_ops programs with a map if the attachment fails. Move bpf_prog_assoc_struct_ops() to after *plink++ to make sure bpf_prog_disassoc_struct_ops() will not miss a program when iterating st_map->links. Note that, dissociating a program from a map requires some attention as it must not reset a poisoned st_ops_assoc or a st_ops_assoc pointing to another map. The former is already guarded in bpf_prog_disassoc_struct_ops(). The latter also will not happen since st_ops_assoc of programs in st_map->links are set by bpf_prog_assoc_struct_ops(), which can only be poisoned or pointing to the current map. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260417174900.2895486-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-17 12:04:14 -07:00
Linus Torvalds	87768582a4	dma-mapping updates for Linux 7.0: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaeCbdQAKCRCJp1EFxbsS RHbWAQCt70dzrU0lu0omTR1HdDP4GTYfuM6nZR91e8/itGN1+QD/XH4I/0wuybzk v5uxbIC6lR3abQRc3YNRXfi+i5j26A4= =Oee2 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) * tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits) dma-buf: heaps: system: document system_cc_shared heap dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory mm: cma: Export cma_alloc(), cma_release() and cma_get_name() dma: contiguous: Export dev_get_cma_area() dma: contiguous: Make dma_contiguous_default_area static dma: contiguous: Make dev_get_cma_area() a proper function dma: contiguous: Turn heap registration logic around of: reserved_mem: rework fdt_init_reserved_mem_node() of: reserved_mem: clarify fdt_scan_reserved_mem*() functions of: reserved_mem: rearrange code a bit of: reserved_mem: replace CMA quirks by generic methods of: reserved_mem: switch to ops based OF_DECLARE() of: reserved_mem: use -ENODEV instead of -ENOENT of: reserved_mem: remove fdt node from the structure dma-mapping: fix false kernel-doc comment marker dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg dma-mapping: Separate DMA sync issuing and completion waiting arm64: Provide dcache_inval_poc_nosync helper arm64: Provide dcache_clean_poc_nosync helper ...	2026-04-17 11:12:42 -07:00
Puranjay Mohan	2845989f2e	bpf: Validate node_id in arena_alloc_pages() arena_alloc_pages() accepts a plain int node_id and forwards it through the entire allocation chain without any bounds checking. Validate node_id before passing it down the allocation chain in arena_alloc_pages(). Fixes: `317460317a` ("bpf: Introduce bpf_arena.") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260417152135.1383754-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-17 10:12:55 -07:00
Linus Torvalds	0b6bc3dbe6	tracing latency updates for 7.1: - Add TIMERLAT_ALIGN osnoise option Add a timer alignment option for timerlat that makes it work like the cyclictest -A option. timelat creates threads to test the latency of the kernel. The alignment option will have these threads trigger at the alignment offsets from each other. Instead of having each thread wake up at the exact same time, if the alignment is set to "20" each thread will wake up at 20 microseconds from the previous one. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaeIBaBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qqM8AQCyz4uL1lCLQtJb5thG8V9QFRkYh5F3 +DyuRNoht3ijyAD+K4nZPAu4F09feWuHkssONtSZECKEuN6EhiHE4XX6Pgc= =WYYW -----END PGP SIGNATURE----- Merge tag 'trace-latency-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing latency update from Steven Rostedt: - Add TIMERLAT_ALIGN osnoise option Add a timer alignment option for timerlat that makes it work like the cyclictest -A option. timelat creates threads to test the latency of the kernel. The alignment option will have these threads trigger at the alignment offsets from each other. Instead of having each thread wake up at the exact same time, if the alignment is set to "20" each thread will wake up at 20 microseconds from the previous one. * tag 'trace-latency-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/osnoise: Add option to align tlat threads	2026-04-17 10:12:11 -07:00
Linus Torvalds	cb30bf881c	tracing updates for v7.1: - Fix printf format warning for bprintf sunrpc uses a trace_printk() that triggers a printf warning during the compile. Move the __printf() attribute around for when debugging is not enabled the warning will go away. - Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write() The FREED flag is checked in the call to event_file_file() and then checked again right afterward, which is unneeded. - Clean up event_file_file() and event_file_data() helpers These helper functions played a different role in the past, but now with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also add a warning to event_file_data() if the file or its data is not present. - Remove updating file->private_data in tracing open All access to the file private data is handled by the helper functions, which do not use file->private_data. Stop updating it on open. - Show ENUM names in function arguments via BTF in function tracing When showing the function arguments when func-args option is set for function tracing, if one of the arguments is found to be an enum, show the name of the enum instead of its number. - Add new trace_call__##name() API for tracepoints Tracepoints are enabled via static_branch() blocks, where when not enabled, there's only a nop that is in the code where the execution will just skip over it. When tracing is enabled, the nop is converted to a direct jump to the tracepoint code. Sometimes more calculations are required to be performed to update the parameters of the tracepoint. In this case, trace_##name##_enabled() is called which is a static_branch() that gets enabled only when the tracepoint is enabled. This allows the extra calculations to also be skipped by the nop: if (trace_foo_enabled()) { x = bar(); trace_foo(x); } Where the x=bar() is only performed when foo is enabled. The problem with this approach is that there's now two static_branch() calls. One for checking if the tracepoint is enabled, and then again to know if the tracepoint should be called. The second one is redundant. Introduce trace_call__foo() that will call the foo() tracepoint directly without doing a static_branch(): if (trace_foo_enabled()) { x = bar(); trace_call__foo(); } - Update various locations to use the new trace_call__##name() API - Move snapshot code out of trace.c Cleaning up trace.c to not be a "dump all", move the snapshot code out of it and into a new trace_snapshot.c file. - Clean up some "%.s" to "%s" - Allow boot kernel command line options to be called multiple times Have options like: ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo Equal to: ftrace_filter=foo,bar,zoo - Fix ipi_raise event CPU field to be a CPU field The ipi_raise target_cpus field is defined as a __bitmask(). There is now a __cpumask() field definition. Update the field to use that. - Have hist_field_name() use a snprintf() and not a series of strcat() It's safer to use snprintf() that a series of strcat(). - Fix tracepoint regfunc balancing A tracepoint can define a "reg" and "unreg" function that gets called before the tracepoint is enabled, and after it is disabled respectively. But on error, after the "reg" func is called and the tracepoint is not enabled, the "unreg" function is not called to tear down what the "reg" function performed. - Fix output that shows what histograms are enabled Event variables are displayed incorrectly in the histogram output. Instead of "sched.sched_wakeup.$var", it is showing "$sched.sched_wakeup.var" where the '$' is in the incorrect location. - Some other simple cleanups. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaeCpvxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qt2WAP44m85BbAjBqJe4WR103eOXV+bREBta dRoReKJOMe519gEAp0rK/HoCvHgHhIGe3gaGdIsNhnaxoFyNWMG/wokoLAY= =Hg6+ -----END PGP SIGNATURE----- Merge tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Fix printf format warning for bprintf sunrpc uses a trace_printk() that triggers a printf warning during the compile. Move the __printf() attribute around for when debugging is not enabled the warning will go away - Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write() The FREED flag is checked in the call to event_file_file() and then checked again right afterward, which is unneeded - Clean up event_file_file() and event_file_data() helpers These helper functions played a different role in the past, but now with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also add a warning to event_file_data() if the file or its data is not present - Remove updating file->private_data in tracing open All access to the file private data is handled by the helper functions, which do not use file->private_data. Stop updating it on open - Show ENUM names in function arguments via BTF in function tracing When showing the function arguments when func-args option is set for function tracing, if one of the arguments is found to be an enum, show the name of the enum instead of its number - Add new trace_call__##name() API for tracepoints Tracepoints are enabled via static_branch() blocks, where when not enabled, there's only a nop that is in the code where the execution will just skip over it. When tracing is enabled, the nop is converted to a direct jump to the tracepoint code. Sometimes more calculations are required to be performed to update the parameters of the tracepoint. In this case, trace_##name##_enabled() is called which is a static_branch() that gets enabled only when the tracepoint is enabled. This allows the extra calculations to also be skipped by the nop: if (trace_foo_enabled()) { x = bar(); trace_foo(x); } Where the x=bar() is only performed when foo is enabled. The problem with this approach is that there's now two static_branch() calls. One for checking if the tracepoint is enabled, and then again to know if the tracepoint should be called. The second one is redundant Introduce trace_call__foo() that will call the foo() tracepoint directly without doing a static_branch(): if (trace_foo_enabled()) { x = bar(); trace_call__foo(); } - Update various locations to use the new trace_call__##name() API - Move snapshot code out of trace.c Cleaning up trace.c to not be a "dump all", move the snapshot code out of it and into a new trace_snapshot.c file - Clean up some "%.s" to "%s" - Allow boot kernel command line options to be called multiple times Have options like: ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo Equal to: ftrace_filter=foo,bar,zoo - Fix ipi_raise event CPU field to be a CPU field The ipi_raise target_cpus field is defined as a __bitmask(). There is now a __cpumask() field definition. Update the field to use that - Have hist_field_name() use a snprintf() and not a series of strcat() It's safer to use snprintf() that a series of strcat() - Fix tracepoint regfunc balancing A tracepoint can define a "reg" and "unreg" function that gets called before the tracepoint is enabled, and after it is disabled respectively. But on error, after the "reg" func is called and the tracepoint is not enabled, the "unreg" function is not called to tear down what the "reg" function performed - Fix output that shows what histograms are enabled Event variables are displayed incorrectly in the histogram output Instead of "sched.sched_wakeup.$var", it is showing "$sched.sched_wakeup.var" where the '$' is in the incorrect location - Some other simple cleanups * tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits) selftests/ftrace: Add test case for fully-qualified variable references tracing: Fix fully-qualified variable reference printing in histograms tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func() tracing: Rebuild full_name on each hist_field_name() call tracing: Report ipi_raise target CPUs as cpumask tracing: Remove duplicate latency_fsnotify() stub tracing: Preserve repeated trace_trigger boot parameters tracing: Append repeated boot-time tracing parameters tracing: Remove spurious default precision from show_event_trigger/filter formats cpufreq: Use trace_call__##name() at guarded tracepoint call sites tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined tracing: Move snapshot code out of trace.c and into trace_snapshot.c mm: damon: Use trace_call__##name() at guarded tracepoint call sites btrfs: Use trace_call__##name() at guarded tracepoint call sites spi: Use trace_call__##name() at guarded tracepoint call sites i2c: Use trace_call__##name() at guarded tracepoint call sites kernel: Use trace_call__##name() at guarded tracepoint call sites tracepoint: Add trace_call__##name() API tracing: trace_mmap.h: fix a kernel-doc warning tracing: Pretty-print enum parameters in function arguments ...	2026-04-17 09:43:12 -07:00
Linus Torvalds	c9e03d5948	Probes updates for v7.1 - fprobe: do not zero out unused fgraph_data. This removes unneeded memset of fgraph_data in fprobe entry handler. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmnhhjEbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bfBMH/21EM4dL2oDrsR3Ud7G+ R1LYB959jkztJlIKQ9xGBdUpyMd97JYszknpTG4QHb7XW4FrcvZMiHeT9AW6ZBzw zJkWVbSCst7sNYpjmqFcf/p3uUJp3/jHNG6byQ2/oWW9bilWPitY1LDL8u/VwzWN ZN35B09bw2IgKXbxCNoNlASdV8w/nnVI+CEjKSbLJBVQffBzU9losLt5Qu1i5w/A Q5O5ZLkeawQhRzilxVAqae7U949MzRvn28EiLyMJ4rJA0vWH+ijg7vFYmC4hnFde GNzJ7q7S87BmApTsSVT+QBCpIGd2XjLYfiYZBBoZIZE2wXI4fxgF+cd05XE51hrG mTs= =ycSD -----END PGP SIGNATURE----- Merge tag 'probes-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull fprobe update from Masami Hiramatsu: - do not zero out unused fgraph_data. This removes unneeded memset of fgraph_data in fprobe entry handler. * tag 'probes-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe: do not zero out unused fgraph_data	2026-04-17 09:18:32 -07:00

1 2 3 4 5 ...

51718 Commits