linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-07 22:14:04 +02:00

Author	SHA1	Message	Date
Marc Zyngier	2de32a25a3	Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next * kvm-arm64/hyp-tracing: (40 commits) : . : EL2 tracing support, adding both 'remote' ring-buffer : infrastructure and the tracing itself, courtesy of : Vincent Donnefort. From the cover letter: : : "The growing set of features supported by the hypervisor in protected : mode necessitates debugging and profiling tools. Tracefs is the : ideal candidate for this task: : : * It is simple to use and to script. : : * It is supported by various tools, from the trace-cmd CLI to the : Android web-based perfetto. : : * The ring-buffer, where are stored trace events consists of linked : pages, making it an ideal structure for sharing between kernel and : hypervisor. : : This series first introduces a new generic way of creating remote events and : remote buffers. Then it adds support to the pKVM hypervisor." : . tracing: selftests: Extend hotplug testing for trace remotes tracing: Non-consuming read for trace remotes with an offline CPU tracing: Adjust cmd_check_undefined to show unexpected undefined symbols tracing: Restore accidentally removed SPDX tag KVM: arm64: avoid unused-variable warning tracing: Generate undef symbols allowlist for simple_ring_buffer KVM: arm64: tracing: add ftrace dependency tracing: add more symbols to whitelist tracing: Update undefined symbols allow list for simple_ring_buffer KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing tracing: selftests: Add hypervisor trace remote tests KVM: arm64: Add selftest event support to nVHE/pKVM hyp KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote KVM: arm64: Add trace reset to the nVHE/pKVM hyp KVM: arm64: Sync boot clock with the nVHE/pKVM hyp KVM: arm64: Add trace remote for the nVHE/pKVM hyp KVM: arm64: Add tracing capability for the nVHE/pKVM hyp KVM: arm64: Support unaligned fixmap in the pKVM hyp KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:21:51 +01:00
Vincent Donnefort	ce47b798ed	tracing: Non-consuming read for trace remotes with an offline CPU When a trace_buffer is created while a CPU is offline, this CPU is cleared from the trace_buffer CPU mask, preventing the creation of a non-consuming iterator (ring_buffer_iter). For trace remotes, it means the iterator fails to be allocated (-ENOMEM) even though there are available ring buffers in the trace_buffer. For non-consuming reads of trace remotes, skip missing ring_buffer_iter to allow reading the available ring buffers. Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260401045100.3394299-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-02 14:16:09 +01:00
Nathan Chancellor	58b4bd1839	tracing: Adjust cmd_check_undefined to show unexpected undefined symbols When the check_undefined command in kernel/trace/Makefile fails, there is no output, making it hard to understand why the build failed. Capture the output of the $(NM) + grep command and print it when failing to make it clearer what the problem is. Fixes: `a717943d8e` ("tracing: Check for undefined symbols in simple_ring_buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260320-cmd_check_undefined-verbose-v1-1-54fc5b061f94@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-23 09:22:22 +00:00
Marc Zyngier	d772964394	tracing: Restore accidentally removed SPDX tag Restore the SPDX tag that was accidentally dropped. Fixes: `7e4b6c9430` ("tracing: add more symbols to whitelist") Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260317194252.1890568-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-18 07:05:11 +00:00
Vincent Donnefort	1211907ac0	tracing: Generate undef symbols allowlist for simple_ring_buffer Compiler and tooling-generated symbols are difficult to maintain across all supported architectures. Make the allowlist more robust by replacing the harcoded list with a mechanism that automatically detects these symbols. This mechanism generates a C function designed to trigger common compiler-inserted symbols. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260316092845.3367411-1-vdonnefort@google.com [maz: added __msan prefix to allowlist as pointed out by Arnd] Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-17 09:01:19 +00:00
Linus Torvalds	d9bf296c39	Probes fixes for v7.0-rc3 - kprobes: avoid crash when rmmod/insmod after ftrace killed This fixes a kernel crash caused by kprobes on the symbol in a module which is unloaded after ftrace_kill() is called. - kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() Remove unneeded WARN messages which can be kicked if the kprobe is using ftrace and it fails to enable the ftrace. Since kprobes correctly handle such failure, we don't need to warn it. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmm2hk8bHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bBZkH+gP3OllhdIU3AUB+vXEb UEE3VE5IZRufSgtjbbJnYI3b8U2dWXw7wmb+fBJ0i0Zf6F+2IUr3hUg1pHNARvlL MMu1YW7PG8gKGUsNc7jpHBVXrnefA4XpzXe7wtxaGvqAV16nL/6xhZlanGgL50Gv +F9cETMIGZ8duF6XVgEMmUUCg88Iwpp9MjzDQOjpRK7z41LND3ccJ2V88ODzDevj idnRbnk12Q9b7xZwGL5P5Ab163kHPFExsXQQnPmy/0fLcyGi8U3hY5EwYhOT85J0 qRZVHpR1yrXqBpaxD/7as5/pfuUluxdzwmDkyRuQGYW6y7xkhlVNEGWqsWZeYLuW IZU= =Zkm3 -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Avoid crash when rmmod/insmod after ftrace killed This fixes a kernel crash caused by kprobes on the symbol in a module which is unloaded after ftrace_kill() is called. - Remove unneeded warnings from __arm_kprobe_ftrace() Remove unneeded WARN messages which can be triggered if the kprobe is using ftrace and it fails to enable the ftrace. Since kprobes correctly handle such failure, we don't need to warn it. * tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() kprobes: avoid crash when rmmod/insmod after ftrace killed	2026-03-15 13:08:05 -07:00
Linus Torvalds	164cb546e9	Fix function tracer recursion bug by marking jiffies_64_to_clock_t() notrace. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmm2J6MRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hQ9g//Qwq0Esh58yP0UkySArlf7xrOEBrfIF2R W7aLDRevhTCklN5z4BlkCaT7XyfZL19CF2AXW1PZ9fSJKlM/vK0xkFToSiKuG6Pc 2BvfYOvePrmT+8AMnhFe2ekzJclUi9s+ars3H0i59Xu6D+CCOqDvvdGoVkdfuL5+ eYlxVZK2TtrZEDkdXkBD0vPvcrXqYxDnZD3Pkpys9NQBxSyIDwVdCr+uNVv3hCsf L6TAnOzUbI7jWp805izAgdQ4E0P2ywb83nK8op++p4J8PpVaGSnW6LmfDmBsKvRG V+cyqXdMToZjEayBdjDgk3GH6M4K2lFWsg1TVfqE1K05fwEdNDYJY7G7eKvhrLH1 2oNvsEMKLXNVSTgn7gSr63IQBC7+J0eKkmBp1Lv9WWjqYC4QYacrFm4Fc5OCmec0 GI3QQwPvNa+i7GByvLbXe8+8q/BLIWOIcCutlU8GRdYaghA1lOynI8dQmKfSdM+j O1TMIqI56ymffsiBe4d9iUyzCWUdB1LXR0Hm36wm+SJmydc6/Ael1gUlFjqXpGKq Ui4h5rRtNvZgsqmsGGgjMvdiZ9dN/e7tz4tdyp4oCcpKhfDT7qNLdHeoyvDxhDs/ uTvpAK5iH6Wxa91DXt4M+17gf97Ws1dIHU6G9og4B1l5bptyXmrMuVPmK8L/wxDu POJGsxuIStA= =Obpb -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix function tracer recursion bug by marking jiffies_64_to_clock_t() notrace" * tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/jiffies: Mark jiffies_64_to_clock_t() notrace	2026-03-15 11:14:09 -07:00
Linus Torvalds	63724e9519	More MM-CID fixes, mostly fixing hangs/races: - Fix CID hangs due to a race between concurrent forks - Fix vfork()/CLONE_VM MMCID bug causing hangs - Remove pointless preemption guard - Fix CID task list walk performance regression on large systems by removing the known-flaky and slow counting logic using for_each_process_thread() in mm_cid_fixup_tasks_to_cpus(), and implementing a simple sched_mm_cid::node list instead Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmm2JvIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hTqg/+K7b4LDOi3nVblmoj6q+mQj2i8DFPbi10 zeAWJJnamYWPvUi+Wxq30JjZJ9v+15Ddcmbhea9m/3u1YO6nAL5TbGeQcJ2LU/7p Ynu9cznv9PfqO4X7WQc3gJC9xx8PbcM00E3JzGxDX/3NDmDBaTOwwuTp41ymcbhm cGfnUQWGt81sMummVzqehszfIRMZHnWflYDJ2gC66rcGXMNBlEX125F8jybOm66n Ez6gO7e9EGn28+hZIufySsxaeeK/3NFVKj1UjGP/FMuBwQFAjHPv61nic33nOKXT yrw7U8DIaYUqFN4d1lplTG72j2YSUj7snn3Q+ubxpzFmOt7RmouVqwlVGEoey5fh cEe2VYSQFoZKQioWWyms1LP1hTOa2JkNVhdjBfRZ8IM+Wp47OaDiw1h1+zwwMDbJ xpDAXEuU+sBZiv2SeBLFQgrGj58gb8pdjN4o47X89mx8TKYWtStrCMsD+MF10LBm dz780Eiinbw5D8JBsxU/ehETpgrAAVmo1KbFx2Q2grAgkJs7jSqBN2KF8NpmH/ZS Jk8SpQOn4Vp8iO32TbpsV/GErG9EQgixQxnkTukv2Qd9kguhmjwbi/blN3rLBlBb XbmR9rRAMfAjlPrk84tn9ecXNWO0NV83IYheAwjip36alSbOs+OcxdhrZ78nxh8C EsKqGl3PeOk= =ce5G -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "More MM-CID fixes, mostly fixing hangs/races: - Fix CID hangs due to a race between concurrent forks - Fix vfork()/CLONE_VM MMCID bug causing hangs - Remove pointless preemption guard - Fix CID task list walk performance regression on large systems by removing the known-flaky and slow counting logic using for_each_process_thread() in mm_cid_fixup_tasks_to_cpus(), and implementing a simple sched_mm_cid::node list instead" * tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Avoid full tasklist walks sched/mmcid: Remove pointless preempt guard sched/mmcid: Handle vfork()/CLONE_VM correctly sched/mmcid: Prevent CID stalls due to concurrent forks	2026-03-15 10:49:47 -07:00
Linus Torvalds	9abff5748e	workqueue: Fixes for v7.0-rc3 - Improve workqueue stall diagnostics: dump all busy workers (not just running ones), show wall-clock duration of in-flight work items, and add a sample module for reproducing stalls. - Fix POOL_BH vs WQ_BH flag namespace mismatch in pr_cont_worker_id(). - Rename pool->watchdog_ts to pool->last_progress_ts and related functions for clarity. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCabRyOg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGaAEAP9xJvKzVtyXBMWxIQNJKqN58VaT/5bNk3tYfm3O ZOuBrQD+Lsxjvv/pSSKFscZJ/x0dthdYncYI8DF3/G6Lnf+LDAc= =MT3y -----END PGP SIGNATURE----- Merge tag 'wq-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - Improve workqueue stall diagnostics: dump all busy workers (not just running ones), show wall-clock duration of in-flight work items, and add a sample module for reproducing stalls - Fix POOL_BH vs WQ_BH flag namespace mismatch in pr_cont_worker_id() - Rename pool->watchdog_ts to pool->last_progress_ts and related functions for clarity * tag 'wq-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Rename show_cpu_pool{s,}_hog{s,}() to reflect broadened scope workqueue: Add stall detector sample module workqueue: Show all busy workers in stall diagnostics workqueue: Show in-flight work item duration in stall diagnostics workqueue: Rename pool->watchdog_ts to pool->last_progress_ts workqueue: Use POOL_BH instead of WQ_BH when checking pool flags	2026-03-13 15:11:05 -07:00
Linus Torvalds	b073bcb8d4	cgroup: Fixes for v7.0-rc3 - Hide PF_EXITING tasks from cgroup.procs to avoid exposing dead tasks that haven't been removed yet, fixing a systemd timeout issue on PREEMPT_RT. - Call rebuild_sched_domains() directly in CPU hotplug instead of deferring to a workqueue, fixing a race where online/offline CPUs could briefly appear in stale sched domains. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCabRyLQ4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGXk4APwKw2HtGyI3OQAHfDBL+wlblPtf8acz0zpDwGCT 9+TWFQD/Rhmtvkb/X/LTwT5PKJksoHOfkD4MqmVMGStKGdxtBAY= =DJ17 -----END PGP SIGNATURE----- Merge tag 'cgroup-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Hide PF_EXITING tasks from cgroup.procs to avoid exposing dead tasks that haven't been removed yet, fixing a systemd timeout issue on PREEMPT_RT - Call rebuild_sched_domains() directly in CPU hotplug instead of deferring to a workqueue, fixing a race where online/offline CPUs could briefly appear in stale sched domains * tag 'cgroup-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Don't expose dead tasks in cgroup cgroup/cpuset: Call rebuild_sched_domains() directly in hotplug	2026-03-13 15:06:31 -07:00
Linus Torvalds	8369b2e97d	sched_ext: Fixes for v7.0-rc3 - Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE() annotations for lock-free accesses to module parameters and dsq->seq. - Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and above) when passed through the int sched_class interface. - Documentation updates: scheduling class precedence, task ownership state machine, example scheduler descriptions, config list cleanup. - Selftest fix for format specifier and buffer length in file_write_long(). -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCabRyHg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGZiWAQCmUOHiGAk73p9DDn6Zyrm+o/iQm/iOinchBeUs ZiG0bgEAn15giAnLCA5Zs6cG7PemxBH1v7ctyzTjh1VsBds0rwo= =zXix -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE() annotations for lock-free accesses to module parameters and dsq->seq - Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and above) when passed through the int sched_class interface - Documentation updates: scheduling class precedence, task ownership state machine, example scheduler descriptions, config list cleanup - Selftest fix for format specifier and buffer length in file_write_long() * tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags sched_ext: Documentation: Update sched-ext.rst sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass() sched_ext: Documentation: Mention scheduling class precedence sched_ext: Document task ownership state machine sched_ext: Use READ_ONCE() for lock-free reads of module param variables sched_ext/selftests: Fix format specifier and buffer length in file_write_long() sched_ext: Use WRITE_ONCE() for the write side of dsq->seq update	2026-03-13 14:54:56 -07:00
Masami Hiramatsu (Google)	5ef268cb7a	kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() Remove unneeded warnings for handled errors from __arm_kprobe_ftrace() because all caller handled the error correctly. Link: https://lore.kernel.org/all/177261531182.1312989.8737778408503961141.stgit@mhiramat.tok.corp.google.com/ Reported-by: Zw Tang <shicenci@gmail.com> Closes: https://lore.kernel.org/all/CAPHJ_V+J6YDb_wX2nhXU6kh466Dt_nyDSas-1i_Y8s7tqY-Mzw@mail.gmail.com/ Fixes: `9c89bb8e32` ("kprobes: treewide: Cleanup the error messages for kprobes") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-03-13 23:15:26 +09:00
Masami Hiramatsu (Google)	e113f0b46d	kprobes: avoid crash when rmmod/insmod after ftrace killed After we hit ftrace is killed by some errors, the kernel crash if we remove modules in which kprobe probes. BUG: unable to handle page fault for address: fffffbfff805000d PGD 817fcc067 P4D 817fcc067 PUD 817fc8067 PMD 101555067 PTE 0 Oops: Oops: 0000 [#1] SMP KASAN PTI CPU: 4 UID: 0 PID: 2012 Comm: rmmod Tainted: G W OE Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE RIP: 0010:kprobes_module_callback+0x89/0x790 RSP: 0018:ffff88812e157d30 EFLAGS: 00010a02 RAX: 1ffffffff805000d RBX: dffffc0000000000 RCX: ffffffff86a8de90 RDX: ffffed1025c2af9b RSI: 0000000000000008 RDI: ffffffffc0280068 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1025c2af9a R10: ffff88812e157cd7 R11: 205d323130325420 R12: 0000000000000002 R13: ffffffffc0290488 R14: 0000000000000002 R15: ffffffffc0280040 FS: 00007fbc450dd740(0000) GS:ffff888420331000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff805000d CR3: 000000010f624000 CR4: 00000000000006f0 Call Trace: <TASK> notifier_call_chain+0xc6/0x280 blocking_notifier_call_chain+0x60/0x90 __do_sys_delete_module.constprop.0+0x32a/0x4e0 do_syscall_64+0x5d/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e This is because the kprobe on ftrace does not correctly handles the kprobe_ftrace_disabled flag set by ftrace_kill(). To prevent this error, check kprobe_ftrace_disabled in __disarm_kprobe_ftrace() and skip all ftrace related operations. Link: https://lore.kernel.org/all/176473947565.1727781.13110060700668331950.stgit@mhiramat.tok.corp.google.com/ Reported-by: Ye Bin <yebin10@huawei.com> Closes: https://lore.kernel.org/all/20251125020536.2484381-1-yebin@huaweicloud.com/ Fixes: `ae6aa16fdc` ("kprobes: introduce ftrace based optimization") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-13 23:14:14 +09:00
Arnd Bergmann	7e4b6c9430	tracing: add more symbols to whitelist Randconfig builds show a number of cryptic build errors from hitting undefined symbols in simple_ring_buffer.o: make[7]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:147: kernel/trace/simple_ring_buffer.o.checked] Error 1 These happen with CONFIG_TRACE_BRANCH_PROFILING, CONFIG_KASAN_HW_TAGS, CONFIG_STACKPROTECTOR, CONFIG_DEBUG_IRQFLAGS and indirectly from WARN_ON(). Add exceptions for each one that I have hit so far on arm64, x86_64 and arm randconfig builds. Other architectures likely hit additional ones, so it would be nice to produce a little more verbose output that include the name of the missing symbols directly. Fixes: `a717943d8e` ("tracing: Check for undefined symbols in simple_ring_buffer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260312123601.625063-2-arnd@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-12 15:19:29 +00:00
Vincent Donnefort	5f2f830471	tracing: Update undefined symbols allow list for simple_ring_buffer Undefined symbols are not allowed for simple_ring_buffer.c. But some compiler emitted symbols are missing in the allowlist. Update it. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Fixes: `a717943d8e` ("tracing: Check for undefined symbols in simple_ring_buffer") Closes: https://lore.kernel.org/all/20260311221816.GA316631@ax162/ Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260312113535.2213350-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-12 15:14:06 +00:00
Thomas Gleixner	192d852129	sched/mmcid: Avoid full tasklist walks Chasing vfork()'ed tasks on a CID ownership mode switch requires a full task list walk, which is obviously expensive on large systems. Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid and walk this list instead. This removes the proven to be flaky counting logic and avoids a full task list walk in the case of vfork()'ed tasks. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260310202526.183824481@kernel.org	2026-03-11 12:01:07 +01:00
Thomas Gleixner	7574ac6e49	sched/mmcid: Remove pointless preempt guard This is a leftover from the early versions of this function where it could be invoked without mm::mm_cid::lock held. Remove it and add lockdep asserts instead. Fixes: `653fda7ae7` ("sched/mmcid: Switch over to the new mechanism") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260310202526.116363613@kernel.org	2026-03-11 12:01:06 +01:00
Thomas Gleixner	28b5a13950	sched/mmcid: Handle vfork()/CLONE_VM correctly Matthieu and Jiri reported stalls where a task endlessly loops in mm_get_cid() when scheduling in. It turned out that the logic which handles vfork()'ed tasks is broken. It is invoked when the number of tasks associated to a process is smaller than the number of MMCID users. It then walks the task list to find the vfork()'ed task, but accounts all the already processed tasks as well. If that double processing brings the number of to be handled tasks to 0, the walk stops and the vfork()'ed task's CID is not fixed up. As a consequence a subsequent schedule in fails to acquire a (transitional) CID and the machine stalls. Cure this by removing the accounting condition and make the fixup always walk the full task list if it could not find the exact number of users in the process' thread list. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Closes: https://lore.kernel.org/b24ffcb3-09d5-4e48-9070-0b69bc654281@kernel.org Reported-by: Matthieu Baerts <matttbe@kernel.org> Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260310202526.048657665@kernel.org	2026-03-11 12:01:06 +01:00
Thomas Gleixner	b2e48c429e	sched/mmcid: Prevent CID stalls due to concurrent forks A newly forked task is accounted as MMCID user before the task is visible in the process' thread list and the global task list. This creates the following problem: CPU1 CPU2 fork() sched_mm_cid_fork(tnew1) tnew1->mm.mm_cid_users++; tnew1->mm_cid.cid = getcid() -> preemption fork() sched_mm_cid_fork(tnew2) tnew2->mm.mm_cid_users++; // Reaches the per CPU threshold mm_cid_fixup_tasks_to_cpus() for_each_other(current, p) .... As tnew1 is not visible yet, this fails to fix up the already allocated CID of tnew1. As a consequence a subsequent schedule in might fail to acquire a (transitional) CID and the machine stalls. Move the invocation of sched_mm_cid_fork() after the new task becomes visible in the thread and the task list to prevent this. This also makes it symmetrical vs. exit() where the task is removed as CID user before the task is removed from the thread and task lists. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260310202525.969061974@kernel.org	2026-03-11 12:01:06 +01:00
Steven Rostedt	755a648e78	time/jiffies: Mark jiffies_64_to_clock_t() notrace The trace_clock_jiffies() function that handles the "uptime" clock for tracing calls jiffies_64_to_clock_t(). This causes the function tracer to constantly recurse when the tracing clock is set to "uptime". Mark it notrace to prevent unnecessary recursion when using the "uptime" clock. Fixes: `58d4e21e50` ("tracing: Fix wraparound problems in "uptime" trace clock") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260306212403.72270bb2@robin	2026-03-11 10:33:12 +01:00
Rafael J. Wysocki	d557640e4c	sched: idle: Make skipping governor callbacks more consistent If the cpuidle governor .select() callback is skipped because there is only one idle state in the cpuidle driver, the .reflect() callback should be skipped as well, at least for consistency (if not for correctness), so do it. Fixes: `e5c9ffc6ae` ("cpuidle: Skip governor when only one idle state is available") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/12857700.O9o76ZdvQC@rafael.j.wysocki	2026-03-10 16:03:02 +01:00
Vincent Donnefort	a717943d8e	tracing: Check for undefined symbols in simple_ring_buffer The simple_ring_buffer implementation must remain simple enough to be used by the pKVM hypervisor. Prevent the object build if unresolved symbols are found. Link: https://patch.msgid.link/20260309162516.2623589-19-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	635923081c	tracing: load/unload page callbacks for simple_ring_buffer Add load/unload callback used for each admitted page in the ring-buffer. This will be later useful for the pKVM hypervisor which uses a different VA space and need to dynamically map/unmap the ring-buffer pages. Link: https://patch.msgid.link/20260309162516.2623589-18-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	ea908a2b79	tracing: Add a trace remote module for testing Add a module to help testing the tracefs support for trace remotes. This module: * Use simple_ring_buffer to write into a ring-buffer. * Declare a single "selftest" event that can be triggered from user-space. * Register a "test" trace remote. This is intended to be used by trace remote selftests. Link: https://patch.msgid.link/20260309162516.2623589-15-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	34e5b958bd	tracing: Introduce simple_ring_buffer Add a simple implementation of the kernel ring-buffer. This intends to be used later by ring-buffer remotes such as the pKVM hypervisor, hence the need for a cut down version (write only) without any dependency. Link: https://patch.msgid.link/20260309162516.2623589-14-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	93ae1b76ff	ring-buffer: Export buffer_data_page and macros In preparation for allowing the writing of ring-buffer compliant pages outside of ring_buffer.c, move buffer_data_page and timestamps encoding macros into the publicly available ring_buffer_types.h. Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	775cb093bc	tracing: Add events/ root files to trace remotes Just like for the kernel events directory, add 'enable', 'header_page' and 'header_event' at the root of the trace remote events/ directory. Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	072529158e	tracing: Add events to trace remotes An event is predefined point in the writer code that allows to log data. Following the same scheme as kernel events, add remote events, described to user-space within the events/ tracefs directory found in the corresponding trace remote. Remote events are expected to be described during the trace remote registration. Add also a .enable_event callback for trace_remote to toggle the event logging, if supported. Link: https://patch.msgid.link/20260309162516.2623589-10-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	bf2ba0f8ca	tracing: Add init callback to trace remotes Add a .init call back so the trace remote callers can add entries to the tracefs directory. Link: https://patch.msgid.link/20260309162516.2623589-9-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	330b0cceb3	tracing: Add non-consuming read to trace remotes Allow reading the trace file for trace remotes. This performs a non-consuming read of the trace buffer. Link: https://patch.msgid.link/20260309162516.2623589-8-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	9af4ab0e11	tracing: Add reset to trace remotes Allow to reset the trace remote buffer by writing to the Tracefs "trace" file. This is similar to the regular Tracefs interface. Link: https://patch.msgid.link/20260309162516.2623589-7-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	96e43537af	tracing: Introduce trace remotes A trace remote relies on ring-buffer remotes to read and control compatible tracing buffers, written by entity such as firmware or hypervisor. Add a Tracefs directory remotes/ that contains all instances of trace remotes. Each instance follows the same hierarchy as any other to ease the support by existing user-space tools. This currently does not provide any event support, which will come later. Link: https://patch.msgid.link/20260309162516.2623589-6-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	fbd1743ecb	ring-buffer: Add non-consuming read for ring-buffer remotes Hopefully, the remote will only swap pages on the kernel instruction (via the swap_reader_page() callback). This means we know at what point the ring-buffer geometry has changed. It is therefore possible to rearrange the kernel view of that ring-buffer to allow non-consuming read. Link: https://patch.msgid.link/20260309162516.2623589-5-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	2e67fabd8b	ring-buffer: Introduce ring-buffer remotes Add ring-buffer remotes to support entities outside of the kernel (such as firmware or a hypervisor) that writes events into a ring-buffer using the tracefs format Require a description of the ring-buffer pages (struct trace_buffer_desc) and callbacks (swap_reader_page and reset) to set up the ring-buffer on the kernel side. Expect the remote entity to maintain and update the meta-page. Link: https://patch.msgid.link/20260309162516.2623589-4-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	e682207bf7	ring-buffer: Store bpage pointers into subbuf_ids The subbuf_ids field allows to point to a specific page from the ring-buffer based on its ID. As a preparation or the upcoming ring-buffer remote support, point this array to the buffer_page instead of the buffer_data_page. Link: https://patch.msgid.link/20260309162516.2623589-3-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	7d776a3627	ring-buffer: Add page statistics to the meta-page Add two fields pages_touched and pages_lost to the ring-buffer meta-page. Those fields are useful to get the number of used pages in the ring-buffer. Link: https://patch.msgid.link/20260309162516.2623589-2-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
zhidao su	2fcfe5951e	sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer scx_enable() uses double-checked locking to lazily initialize a static kthread_worker pointer. The fast path reads helper locklessly: if (!READ_ONCE(helper)) { // lockless read -- no helper_mutex The write side initializes helper under helper_mutex, but previously used a plain assignment: helper = kthread_run_worker(0, "scx_enable_helper"); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ plain write -- KCSAN data race with READ_ONCE() above Since READ_ONCE() on the fast path and the plain write on the initialization path access the same variable without a common lock, they constitute a data race. KCSAN requires that all sides of a lock-free access use READ_ONCE()/WRITE_ONCE() consistently. Use a temporary variable to stage the result of kthread_run_worker(), and only WRITE_ONCE() into helper after confirming the pointer is valid. This avoids a window where a concurrent caller on the fast path could observe an ERR pointer via READ_ONCE(helper) before the error check completes. Fixes: `b06ccbabe2` ("sched_ext: Fix starvation of scx_enable() under fair-class saturation") Signed-off-by: zhidao su <suzhidao@xiaomi.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-09 06:08:26 -10:00
Linus Torvalds	6ff1020c2f	Make clock_adjtime() syscall timex validation slightly more permissive for auxiliary clocks, to not reject syscalls based on the status field that do not try to modify the status field. This makes ABI behavior in clock_adjtime() consistent with CLOCK_REALTIME. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxzkRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hq8g//fRTp9p2pVfmRWUoxWELrT/bMK1r+D6F3 6BYkwp68peRhchVrFxkI/Y37rjAIC8CXZSPuvkubqIROrH3gA7SCCQYCcZKdss+t i3lbpQF8IbagPIS5btpOAN2KRCu2S7aqjDdH0rWb9VhQdlW7fI71Z72Uz07YEA+q TWpy3gE531P/dgAqcvIAyMHnFZDCb1S6z8wZvT3SV4r4GkczfXpTFyNHHtETSu0V 7isuOBfloM4HpDU50oUotlqBiwigH27J2Ad6aIrnCA7iaQPrzREysG+8E96ShhaB g6+qaQS5gTgFryA1bggA6LzGveLOI8bjy2kZ2SnZWuFPj46OReGIuwK4kyY07jz2 xk0sd37alN16ETKhGVLfAgjmzVGoKVNnp4ak9J3VmMbxWEmXeObuOC8SmF9VImc1 4bRaG9+Tlfd4DtOOz2+E4VcPE1D9A2tMw4esgUaXRrrp4GlEcKOJ5PRlWj0uGvrh xLPLbL0XIiWsjMsHdVs4Gq9Z0MvfRHc4VLOviIqLFtHox2DscZypPkyjKAv5inp0 /VWyUYJkkr07RMQQ3nqHnP+lzAfO2aSeZ72D9NnHStL3RPbGC4jYvpoi8dnH0/TT PKJgj2jb7u3h+1cxKBi1RM0JbxUYD5+4N8zfJISa9uqkHZ3XY3VyuuT+2RHO6CQp d1BdX0V4oDA= =zjov -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Make clock_adjtime() syscall timex validation slightly more permissive for auxiliary clocks, to not reject syscalls based on the status field that do not try to modify the status field. This makes the ABI behavior in clock_adjtime() consistent with CLOCK_REALTIME" * tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix timex status validation for auxiliary clocks	2026-03-07 17:09:15 -08:00
Linus Torvalds	b1b9a9d0b5	Fix a DL scheduler bug that may corrupt internal metrics during PI and setscheduler() syscalls, resulting in kernel warnings and misbehavior. Found during stress-testing. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxW0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jzXRAAjqcTwaC72cd+6cnh+tE9/fcjXf1JtK5e TxdTygsgBAbXh63rD4y4cRPueqBR1ne52TAV0lI8Z1pBM/XthnaF4MJBue6B8EdX SQIE7hpOh6R81I6hnuhNoNsAy95jQvYXN5SFaKMuNacWNVX8k3vPzN5XPxa7yHLN MVUL+O9c7Xwg4v30Nz/QIv0mFoPosbh4PIdeVpD/ghJAXtXhsCg7EYOivEk9UsSy TAcq3qRnfDyroIOc5/dnSglEwX12LQqVFBba97nI/TCjaH23PsUIt2Dg2rpJbJ+k bLh4hGpOoyQvgE/PSEdoMl1F9pXw3XiUOzAGrFJdqn0iKL+7WzuTEQH+vAToGZQv 4hF5BtMjLrAYY/MVsD8qJGm/pne5nTIo2gSsG7LZPwCmMj0rDUGXfO4G8N8LHhT7 ExQ/t2+z0BczsKdvF3VKX+RweT51AOYOWcmLIdA9h1jdAy858GVmTzSWDveAEJ0L yToPQ0UMCz985g9il6Rdb5cIphD7DjuUeFNnYTCm63cVpZdA4j8Da74r4KfP2jNY tRcbiUy+A7MwqW5aERgwBtI6XCz6QZqW3svJW9yYghf40lgNGAcDCTTdf2r7g0Ho Q0pQVxEk9mXD5N1otjzSS4piLbzoMaPH1L4W6ceHN1RzBjfSJED3tmfGUHZUDqNE w33GhhQAFpA= =vP5l -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a DL scheduler bug that may corrupt internal metrics during PI and setscheduler() syscalls, resulting in kernel warnings and misbehavior. Found during stress-testing" * tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting	2026-03-07 17:07:13 -08:00
Linus Torvalds	8b7f4cd3ac	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmsahoACgkQ6rmadz2v bToD4Q/9Hr2lAELsTRZIENBTYYTfk48Rzrr+rJbZWLfX4nOu9RwhVboTw0gvJr8r QEteYO8+gP5hrIuifcC+vdzW12CpgziBK7/Qj2NB7MGSX0ZCLK0ShQK6TAbNE8uZ xA4qMcJp0BWJpgfU51z4Ur5nMzG0Th2XlY+SGwHYZ/53qRMmP7Tr8R9wPltKuEll J+tS6BE6bFKMQe1CuIYlYfYj9kO3j4yx9S48j1e8x9LXc2/2arfsOIZQ6L11rnxZ YTa9jbcRL0YdLy7d4hMLyPQquF9hiHDrJLJih+YNmbyOCPGD3HP0ib2c2g0ip+qk WlNFLc2K9w0eS02uuAytaxwArdOpDUHG+skWe+dGt95Bk3Xld0hW/JSI5hrohM1E 0YKjuZeGGvTkI5FJ5kk+BWoApy+FveiN1w7VsHZgr1tix5NRZRgJP/5swM+0eSp/ neTzp08fC10gi+GOnUpdF/lI4wEayoGMpo68BTGiM/hP6Qh268wNWgTHaau3Sk/z wfJb90VTcDHk+N8mnbi4970RZ6OdqX2n7aBy8D7hBW31kHhP27zofzYX51CFD8Ln 5NQDU3wX/hUw9tNwoghCNe12//gfDBi6rO/GoEztNFfVVef4QuzsfwKKdxzUMc9a jZvyDyGh87rGLauDWylq4s23vgxgIG/p114uv+eFp3AMkOfT/Dk= =knWl -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix u32/s32 bounds when ranges cross min/max boundary (Eduard Zingerman) - Fix precision backtracking with linked registers (Eduard Zingerman) - Fix linker flags detection for resolve_btfids (Ihor Solodrai) - Fix race in update_ftrace_direct_add/del (Jiri Olsa) - Fix UAF in bpf_trampoline_link_cgroup_shim (Lang Xu) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: resolve_btfids: Fix linker flags detection selftests/bpf: add reproducer for spurious precision propagation through calls bpf: collect only live registers in linked regs Revert "selftests/bpf: Update reg_bound range refinement logic" selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary bpf: Fix u32/s32 bounds when ranges cross min/max boundary bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del	2026-03-07 12:20:37 -08:00
Linus Torvalds	aed0af05a8	tracing fixes for 7.0: - Fix possible NULL pointer dereference in trace_data_alloc() On the error path in trace_data_alloc(), it can call trigger_data_free() with a NULL pointer. This use to be a kfree() but was changed to trigger_data_free() to clean up any partial initialization. The issue is that trigger_data_free() does not expect a NULL pointer. Have trigger_data_free() return safely on NULL pointer. - Fix multiple events on the command line and bootconfig If multiple events are enabled on the command line separately and not grouped, only the last event gets enabled. That is: trace_event=sched_switch trace_event=sched_waking Will only enable sched_waking where as: trace_event=sched_switch,sched_waking Will enable both. The bootconfig makes it even worse as the second way is the more common method. The issue is that a temporary buffer is used to store the events to enable later in boot. Each time the cmdline callback is called, it overwrites what was previously there. Have the callback append the next value (delimited by a comma) if the temporary buffer already has content. - Fix command line trace_buffer_size if >= 2G The logic to allocate the trace buffer uses "int" for the size parameter in the command line code causing overflow issues if more that 2G is specified. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaaxEIRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qn+QAQCM6aJm0ZqDD2dM262M1mQpkU7sW3Dz hZfBpo3YlH55fQEAklsaD96+yKN7PLl1Vh4c0zCelMHZA7kgck/3GqaFAgA= =rn/Z -----END PGP SIGNATURE----- Merge tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix possible NULL pointer dereference in trace_data_alloc() On the trace_data_alloc() error path, it can call trigger_data_free() with a NULL pointer. This used to be a kfree() but was changed to trigger_data_free() to clean up any partial initialization. The issue is that trigger_data_free() does not expect a NULL pointer. Have trigger_data_free() return safely on NULL pointer. - Fix multiple events on the command line and bootconfig If multiple events are enabled on the command line separately and not grouped, only the last event gets enabled. That is: trace_event=sched_switch trace_event=sched_waking will only enable sched_waking whereas: trace_event=sched_switch,sched_waking will enable both. The bootconfig makes it even worse as the second way is the more common method. The issue is that a temporary buffer is used to store the events to enable later in boot. Each time the cmdline callback is called, it overwrites what was previously there. Have the callback append the next value (delimited by a comma) if the temporary buffer already has content. - Fix command line trace_buffer_size if >= 2G The logic to allocate the trace buffer uses "int" for the size parameter in the command line code causing overflow issues if more that 2G is specified. * tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G tracing: Fix enabling multiple events on the kernel command line and bootconfig tracing: Add NULL pointer check to trigger_data_free()	2026-03-07 09:50:54 -08:00
Tejun Heo	57ccf5ccdc	sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags enqueue_task_scx() takes int enq_flags from the sched_class interface. SCX enqueue flags starting at bit 32 (SCX_ENQ_PREEMPT and above) are silently truncated when passed through activate_task(). extra_enq_flags was added as a workaround - storing high bits in rq->scx.extra_enq_flags and OR-ing them back in enqueue_task_scx(). However, the OR target is still the int parameter, so the high bits are lost anyway. The current impact is limited as the only affected flag is SCX_ENQ_PREEMPT which is informational to the BPF scheduler - its loss means the scheduler doesn't know about preemption but doesn't cause incorrect behavior. Fix by renaming the int parameter to core_enq_flags and introducing a u64 enq_flags local that merges both sources. All downstream functions already take u64 enq_flags. Fixes: `f0e1a0643a` ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-07 04:53:32 -10:00
Eduard Zingerman	2658a1720a	bpf: collect only live registers in linked regs Fix an inconsistency between func_states_equal() and collect_linked_regs(): - regsafe() uses check_ids() to verify that cached and current states have identical register id mapping. - func_states_equal() calls regsafe() only for registers computed as live by compute_live_registers(). - clean_live_states() is supposed to remove dead registers from cached states, but it can skip states belonging to an iterator-based loop. - collect_linked_regs() collects all registers sharing the same id, ignoring the marks computed by compute_live_registers(). Linked registers are stored in the state's jump history. - backtrack_insn() marks all linked registers for an instruction as precise whenever one of the linked registers is precise. The above might lead to a scenario: - There is an instruction I with register rY known to be dead at I. - Instruction I is reached via two paths: first A, then B. - On path A: - There is an id link between registers rX and rY. - Checkpoint C is created at I. - Linked register set {rX, rY} is saved to the jump history. - rX is marked as precise at I, causing both rX and rY to be marked precise at C. - On path B: - There is no id link between registers rX and rY, otherwise register states are sub-states of those in C. - Because rY is dead at I, check_ids() returns true. - Current state is considered equal to checkpoint C, propagate_precision() propagates spurious precision mark for register rY along the path B. - Depending on a program, this might hit verifier_bug() in the backtrack_insn(), e.g. if rY ∈ [r1..r5] and backtrack_insn() spots a function call. The reproducer program is in the next patch. This was hit by sched_ext scx_lavd scheduler code. Changes in tests: - verifier_scalar_ids.c selftests need modification to preserve some registers as live for __msg() checks. - exceptions_assert.c adjusted to match changes in the verifier log, R0 is dead after conditional instruction and thus does not get range. - precise.c adjusted to match changes in the verifier log, register r9 is dead after comparison and it's range is not important for test. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Fixes: `0fb3cf6110` ("bpf: use register liveness information for func_states_equal") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-06 21:49:40 -08:00
Calvin Owens	d008ba8be8	tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G Some of the sizing logic through tracer_alloc_buffers() uses int internally, causing unexpected behavior if the user passes a value that does not fit in an int (on my x86 machine, the result is uselessly tiny buffers). Fix by plumbing the parameter's real type (unsigned long) through to the ring buffer allocation functions, which already use unsigned long. It has always been possible to create larger ring buffers via the sysfs interface: this only affects the cmdline parameter. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: `73c5162aa3` ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 22:25:53 -05:00
Eduard Zingerman	fbc7aef517	bpf: Fix u32/s32 bounds when ranges cross min/max boundary Same as in __reg64_deduce_bounds(), refine s32/u32 ranges in __reg32_deduce_bounds() in the following situations: - s32 range crosses U32_MAX/0 boundary, positive part of the s32 range overlaps with u32 range: 0 U32_MAX \| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] \| \|----------------------------\|----------------------------\| \|xxxxx s32 range xxxxxxxxx] [xxxxxxx\| 0 S32_MAX S32_MIN -1 - s32 range crosses U32_MAX/0 boundary, negative part of the s32 range overlaps with u32 range: 0 U32_MAX \| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] \| \|----------------------------\|----------------------------\| \|xxxxxxxxx] [xxxxxxxxxxxx s32 range \| 0 S32_MAX S32_MIN -1 - No refinement if ranges overlap in two intervals. This helps for e.g. consider the following program: call %[bpf_get_prandom_u32]; w0 &= 0xffffffff; if w0 < 0x3 goto 1f; // on fall-through u32 range [3..U32_MAX] if w0 s> 0x1 goto 1f; // on fall-through s32 range [S32_MIN..1] if w0 s< 0x0 goto 1f; // range can be narrowed to [S32_MIN..-1] r10 = 0; 1: ...; The reg_bounds.c selftest is updated to incorporate identical logic, refinement based on non-overflowing range halves: ((x ∩ [0, smax]) ∩ (y ∩ [0, smax])) ∪ ((x ∩ [smin,-1]) ∩ (y ∩ [smin,-1])) Reported-by: Andrea Righi <arighi@nvidia.com> Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Closes: https://lore.kernel.org/bpf/aakqucg4vcujVwif@gpd4/T/ Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-1-f7f67e060a6b@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-06 18:16:06 -08:00
Sebastian Andrzej Siewior	a72f73c4dd	cgroup: Don't expose dead tasks in cgroup Once a task exits it has its state set to TASK_DEAD and then it is removed from the cgroup it belonged to. The last step happens on the task gets out of its last schedule() invocation and is delayed on PREEMPT_RT due to locking constraints. As a result it is possible to receive a pid via waitpid() of a task which is still listed in cgroup.procs for the cgroup it belonged to. This is something that systemd does not expect and as a result it waits for its exit until a time out occurs. This can also be reproduced on !PREEMPT_RT kernel with a significant delay in do_exit() after exit_notify(). Hide the task from the output which have PF_EXITING set which is done before the parent is notified. Keeping zombies with live threads shouldn't break anything (suggested by Tejun). Reported-by: Bert Karwatzki <spasswolf@web.de> Closes: https://lore.kernel.org/all/20260219164648.3014-1-spasswolf@web.de/ Tested-by: Bert Karwatzki <spasswolf@web.de> Fixes: `9311e6c29b` ("cgroup: Fix sleeping from invalid context warning on PREEMPT_RT") Cc: stable@vger.kernel.org # v6.19+ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-06 12:43:25 -10:00
Andrei-Alexandru Tachici	3b1679e086	tracing: Fix enabling multiple events on the kernel command line and bootconfig Multiple events can be enabled on the kernel command line via a comma separator. But if the are specified one at a time, then only the last event is enabled. This is because the event names are saved in a temporary buffer, and each call by the init cmdline code will reset that buffer. This also affects names in the boot config file, as it may call the callback multiple times with an example of: kernel.trace_event = ":mod:rproc_qcom_common", ":mod:qrtr", ":mod:qcom_aoss" Change the cmdline callback function to append a comma and the next value if the temporary buffer already has content. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260302-trace-events-allow-multiple-modules-v1-1-ce4436e37fb8@oss.qualcomm.com Signed-off-by: Andrei-Alexandru Tachici <andrei-alexandru.tachici@oss.qualcomm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 16:54:34 -05:00
Guenter Roeck	457965c13f	tracing: Add NULL pointer check to trigger_data_free() If trigger_data_alloc() fails and returns NULL, event_hist_trigger_parse() jumps to the out_free error path. While kfree() safely handles a NULL pointer, trigger_data_free() does not. This causes a NULL pointer dereference in trigger_data_free() when evaluating data->cmd_ops->set_filter. Fix the problem by adding a NULL pointer check to trigger_data_free(). The problem was found by an experimental code review agent based on gemini-3.1-pro while reviewing backports into v6.18.y. Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260305193339.2810953-1-linux@roeck-us.net Fixes: `0550069cc2` ("tracing: Properly process error handling in event_hist_trigger_parse()") Assisted-by: Gemini:gemini-3.1-pro Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 13:04:30 -05:00
Waiman Long	ca174c705d	cgroup/cpuset: Call rebuild_sched_domains() directly in hotplug Besides deferring the call to housekeeping_update(), commit `6df415aa46` ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue") also defers the rebuild_sched_domains() call to the workqueue. So a new offline CPU may still be in a sched domain or new online CPU not showing up in the sched domains for a short transition period. That could be a problem in some corner cases and can be the cause of a reported test failure[1]. Fix it by calling rebuild_sched_domains_cpuslocked() directly in hotplug as before. If isolated partition invalidation or recreation is being done, the housekeeping_update() call to update the housekeeping cpumasks will still be deferred to a workqueue. In commit `3bfe479671` ("cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together"), housekeeping_update() is called before rebuild_sched_domains() because it needs to access the HK_TYPE_DOMAIN housekeeping cpumask. That is now changed to use the static HK_TYPE_DOMAIN_BOOT cpumask as HK_TYPE_DOMAIN cpumask is now changeable at run time. As a result, we can move the rebuild_sched_domains() call before housekeeping_update() with the slight advantage that it will be done in the same cpus_read_lock critical section without the possibility of interference by a concurrent cpu hot add/remove operation. As it doesn't make sense to acquire cpuset_mutex/cpuset_top_mutex after calling housekeeping_update() and immediately release them again, move the cpuset_full_unlock() operation inside update_hk_sched_domains() and rename it to cpuset_update_sd_hk_unlock() to signify that it will release the full set of locks. [1] https://lore.kernel.org/lkml/1a89aceb-48db-4edd-a730-b445e41221fe@nvidia.com Fixes: `6df415aa46` ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue") Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-06 06:58:25 -10:00
David Carlier	1dde502587	sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass() Commit `0927780c90` ("sched_ext: Use READ_ONCE() for lock-free reads of module param variables") annotated the plain reads of scx_slice_bypass_us and scx_bypass_lb_intv_us in bypass_lb_cpu(), but missed a third site in scx_bypass(): WRITE_ONCE(scx_slice_dfl, scx_slice_bypass_us * NSEC_PER_USEC); scx_slice_bypass_us is a module parameter writable via sysfs in process context through set_slice_us() -> param_set_uint_minmax(), which performs a plain store without holding bypass_lock. scx_bypass() reads the variable under bypass_lock, but since the writer does not take that lock, the two accesses are concurrent. WRITE_ONCE() only applies volatile semantics to the store of scx_slice_dfl -- the val expression containing scx_slice_bypass_us is evaluated as a plain read, providing no protection against concurrent writes. Wrap the read with READ_ONCE() to complete the annotation started by commit `0927780c90` and make the access KCSAN-clean, consistent with the existing READ_ONCE(scx_slice_bypass_us) in bypass_lb_cpu(). Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-06 06:57:23 -10:00

1 2 3 4 5 ...

51040 Commits