linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Linus Torvalds	fdbfee9fc5	Runtime Verification updates for 7.1: - Refactor da_monitor header to share handlers across monitor types No functional changes, only less code duplication. - Add Hybrid Automata model class Add a new model class that extends deterministic automata by adding constraints on transitions and states. Those constraints can take into account wall-clock time and as such allow RV monitor to make assertions on real time. Add documentation and code generation scripts. - Add stall monitor as hybrid automaton example Add a monitor that triggers a violation when a task is stalling as an example of automaton working with real time variables. - Convert the opid monitor to a hybrid automaton The opid monitor can be heavily simplified if written as a hybrid automaton: instead of tracking preempt and interrupt enable/disable events, it can just run constraints on the preemption/interrupt states when events like wakeup and need_resched verify. - Add support for per-object monitors in DA/HA Allow writing deterministic and hybrid automata monitors for generic objects (e.g. any struct), by exploiting a hash table where objects are saved. This allows to track more than just tasks in RV. For instance it will be used to track deadline entities in deadline monitors. - Add deadline tracepoints and move some deadline utilities Prepare the ground for deadline monitors by defining events and exporting helpers. - Add nomiss deadline monitor Add first example of deadline monitor asserting all entities complete before their deadline. - Improve rvgen error handling Introduce AutomataError exception class and better handle expected exceptions while showing a backtrace for unexpected ones. - Improve python code quality in rvgen Refactor the rvgen generation scripts to align with python best practices: use f-strings instead of %, use len() instead of __len__(), remove semicolons, use context managers for file operations, fix whitespace violations, extract magic strings into constants, remove unused imports and methods. - Fix small bugs in rvgen The generator scripts presented some corner case bugs: logical error in validating what a correct dot file looks like, fix an isinstance() check, enforce a dot file has an initial state, fix type annotations and typos in comments. - rvgen refactoring Refactor automata.py to use iterator-based parsing and handle required arguments directly in argparse. - Allow epoll in rtapp-sleep monitor The epoll_wait call is now rt-friendly so it should be allowed in the sleep monitor as a valid sleep method. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCad4kUxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qiJgAPsGJ0TCTubdn2x6fpwLDKUcSsNsYrok m8gLHeK0rmkKNQD/ajUW+RBsOufXAsnSzUoUcl7sw1TSiQJ085U+0+ZeDgM= =WZqk -----END PGP SIGNATURE----- Merge tag 'trace-rv-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Refactor da_monitor header to share handlers across monitor types No functional changes, only less code duplication. - Add Hybrid Automata model class Add a new model class that extends deterministic automata by adding constraints on transitions and states. Those constraints can take into account wall-clock time and as such allow RV monitor to make assertions on real time. Add documentation and code generation scripts. - Add stall monitor as hybrid automaton example Add a monitor that triggers a violation when a task is stalling as an example of automaton working with real time variables. - Convert the opid monitor to a hybrid automaton The opid monitor can be heavily simplified if written as a hybrid automaton: instead of tracking preempt and interrupt enable/disable events, it can just run constraints on the preemption/interrupt states when events like wakeup and need_resched verify. - Add support for per-object monitors in DA/HA Allow writing deterministic and hybrid automata monitors for generic objects (e.g. any struct), by exploiting a hash table where objects are saved. This allows to track more than just tasks in RV. For instance it will be used to track deadline entities in deadline monitors. - Add deadline tracepoints and move some deadline utilities Prepare the ground for deadline monitors by defining events and exporting helpers. - Add nomiss deadline monitor Add first example of deadline monitor asserting all entities complete before their deadline. - Improve rvgen error handling Introduce AutomataError exception class and better handle expected exceptions while showing a backtrace for unexpected ones. - Improve python code quality in rvgen Refactor the rvgen generation scripts to align with python best practices: use f-strings instead of %, use len() instead of __len__(), remove semicolons, use context managers for file operations, fix whitespace violations, extract magic strings into constants, remove unused imports and methods. - Fix small bugs in rvgen The generator scripts presented some corner case bugs: logical error in validating what a correct dot file looks like, fix an isinstance() check, enforce a dot file has an initial state, fix type annotations and typos in comments. - rvgen refactoring Refactor automata.py to use iterator-based parsing and handle required arguments directly in argparse. - Allow epoll in rtapp-sleep monitor The epoll_wait call is now rt-friendly so it should be allowed in the sleep monitor as a valid sleep method. * tag 'trace-rv-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (32 commits) rv: Allow epoll in rtapp-sleep monitor rv/rvgen: fix _fill_states() return type annotation rv/rvgen: fix unbound loop variable warning rv/rvgen: enforce presence of initial state rv/rvgen: extract node marker string to class constant rv/rvgen: fix isinstance check in Variable.expand() rv/rvgen: make monitor arguments required in rvgen rv/rvgen: remove unused __get_main_name method rv/rvgen: remove unused sys import from dot2c rv/rvgen: refactor automata.py to use iterator-based parsing rv/rvgen: use class constant for init marker rv/rvgen: fix DOT file validation logic error rv/rvgen: fix PEP 8 whitespace violations rv/rvgen: fix typos in automata and generator docstring and comments rv/rvgen: use context managers for file operations rv/rvgen: remove unnecessary semicolons rv/rvgen: replace __len__() calls with len() rv/rvgen: replace % string formatting with f-strings rv/rvgen: remove bare except clauses in generator rv/rvgen: introduce AutomataError exception class ...	2026-04-15 17:15:18 -07:00
Linus Torvalds	e4bf304f00	ring-buffer updates for 7.1: - Add remote buffers for pKVM pKVM has a hypervisor component that is used to protect the guest from the host kernel. This hypervisor is a black box to the kernel as the kernel is to user space. The remote buffers are used to have a memory mapping between the hypervisor and the kernel where kernel may send commands to enable tracing within the hypervisor. Then the kernel will read this memory mapping just like user space can read the memory mapped ring buffer of the kernel tracing system. Since the hypervisor only has a single context, it doesn't need to worry about races between normal context, interrupt context and NMIs like the kernel does. The ring buffer it uses doesn't need to be as complex. The remote buffers are a simple version of the ring buffer that works in a single context. They are still per-CPU and use sub buffers. The data layout is the same as the kernel's ring buffer to share the same parsing. Currently, only ARM64 implements pKVM, but there's work to implement it also in x86. The remote buffer code is separated out from the ARM implementation so that it can be used in the future by x86. The ARM64 updates for pKVM is in the ARM/KVM tree and it merged in the remote buffers of this tree. - Merge commit `f35dbac694` ("ring-buffer: Fix to update per-subbuf entries of persistent ring buffer")` A fix was merged upstream that some new changes depended on. The upstream commit was merged into the ring buffer branch to fulfil the dependency. - Make the backup instance non reusable The backup instance is a copy of the persistent ring buffer so that the persistent ring buffer could start recording again without using the data from the previous boot. The backup isn't for normal tracing. It is made read-only, and after it is consumed, it is automatically removed. - Have backup copy persistent instance before it starts recording To allow the persistent ring buffer to start recording from the kernel command line commands, move the copy of the backup instance to before the the command line options start recording. - Report header_page overwrite field as "char" and not "int' The rust parser of the header_page file was triggering a warning when it defined the overwrite variable as "int" but it was only a single byte in size. - Fix memory barriers for the trace_buffer CPU mask When a CPU comes online, the bit is set to allow readers to know that the CPU buffer is allocated. The bit is set after the allocation is done, and a smp_wmb() is performed after the allocation and before the setting of the bit. But instead of adding a smp_rmb() to all readers, since once a buffer is created for a CPU it is not deleted if that CPU goes offline, so this allocation is almost always done at boot up before any readers exist. If for the unlikely case where a CPU comes online for the first time after the system boot has finished, send an IPI to all CPUs to force the smp_rmb() for each CPU. - Show clock function being used in debugging ring buffer data When the ring buffer checks are enabled and the ring buffer detects an inconsistency in the times of the invents, print out the clock being used when the error occurred. There was a very hard to hit bug that would happen every so often and it ended up being only triggered when the jiffies clock was being used. If the bug showed the clock being used, it would have been much easier to find the problem (which was an internal function was being traced which caused the clock accounting to go off). -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCad9S9xQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qpO8AQDwq2GKeups4aMfOjsUAXX9pIGWI9O1 eerhmazgi1LJLQEAtKRNSddOj/7nOJ5hLillJH4uAQOnJBkAWtjlTSUnLg8= =HXkO -----END PGP SIGNATURE----- Merge tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Add remote buffers for pKVM pKVM has a hypervisor component that is used to protect the guest from the host kernel. This hypervisor is a black box to the kernel as the kernel is to user space. The remote buffers are used to have a memory mapping between the hypervisor and the kernel where kernel may send commands to enable tracing within the hypervisor. Then the kernel will read this memory mapping just like user space can read the memory mapped ring buffer of the kernel tracing system. Since the hypervisor only has a single context, it doesn't need to worry about races between normal context, interrupt context and NMIs like the kernel does. The ring buffer it uses doesn't need to be as complex. The remote buffers are a simple version of the ring buffer that works in a single context. They are still per-CPU and use sub buffers. The data layout is the same as the kernel's ring buffer to share the same parsing. Currently, only ARM64 implements pKVM, but there's work to implement it also in x86. The remote buffer code is separated out from the ARM implementation so that it can be used in the future by x86. The ARM64 updates for pKVM is in the ARM/KVM tree and it merged in the remote buffers of this tree. - Make the backup instance non reusable The backup instance is a copy of the persistent ring buffer so that the persistent ring buffer could start recording again without using the data from the previous boot. The backup isn't for normal tracing. It is made read-only, and after it is consumed, it is automatically removed. - Have backup copy persistent instance before it starts recording To allow the persistent ring buffer to start recording from the kernel command line commands, move the copy of the backup instance to before the the command line options start recording. - Report header_page overwrite field as "char" and not "int' The rust parser of the header_page file was triggering a warning when it defined the overwrite variable as "int" but it was only a single byte in size. - Fix memory barriers for the trace_buffer CPU mask When a CPU comes online, the bit is set to allow readers to know that the CPU buffer is allocated. The bit is set after the allocation is done, and a smp_wmb() is performed after the allocation and before the setting of the bit. But instead of adding a smp_rmb() to all readers, since once a buffer is created for a CPU it is not deleted if that CPU goes offline, so this allocation is almost always done at boot up before any readers exist. If for the unlikely case where a CPU comes online for the first time after the system boot has finished, send an IPI to all CPUs to force the smp_rmb() for each CPU. - Show clock function being used in debugging ring buffer data When the ring buffer checks are enabled and the ring buffer detects an inconsistency in the times of the invents, print out the clock being used when the error occurred. There was a very hard to hit bug that would happen every so often and it ended up being only triggered when the jiffies clock was being used. If the bug showed the clock being used, it would have been much easier to find the problem (which was an internal function was being traced which caused the clock accounting to go off). * tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ring-buffer: Prevent off-by-one array access in ring_buffer_desc_page() ring-buffer: Report header_page overwrite as char tracing: Allow backup to save persistent ring buffer before it starts tracing/Documentation: Add a section about backup instance tracing: Remove the backup instance automatically after read tracing: Make the backup instance non-reusable ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers ring-buffer: Show what clock function is used on timestamp errors tracing: Check for undefined symbols in simple_ring_buffer tracing: load/unload page callbacks for simple_ring_buffer Documentation: tracing: Add tracing remotes tracing: selftests: Add trace remote tests tracing: Add a trace remote module for testing tracing: Introduce simple_ring_buffer ring-buffer: Export buffer_data_page and macros tracing: Add helpers to create trace remote events tracing: Add events/ root files to trace remotes tracing: Add events to trace remotes tracing: Add init callback to trace remotes tracing: Add non-consuming read to trace remotes ...	2026-04-15 15:59:46 -07:00
Linus Torvalds	1521829632	ftrace updates for 7.1: - Speed up ftrace_lookup_symbols() for single lookups The kallsyms lookup in ftrace_lookup_symbols() does a linear search over each symbol. This is fine when it must match multiple strings, but when there's only a single string being searched for, using a binary search is much more efficient. When a single string is passed in to search, use the binary search method. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCad4RehQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6quZeAQD0rsWOGF0D970asst2WtCOoQF5G2ao l0ED7QZFoB1qcAEA7ZAuV9+zJwFOvhAekifuDEArPwpadlcUC4dU7tJU1w0= =cMjY -----END PGP SIGNATURE----- Merge tag 'ftrace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace update from Steven Rostedt: - Speed up ftrace_lookup_symbols() for single lookups The kallsyms lookup in ftrace_lookup_symbols() does a linear search over each symbol. This is fine when it must match multiple strings, but when there's only a single string being searched for, using a binary search is much more efficient. When a single string is passed in to search, use the binary search method. * tag 'ftrace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Use kallsyms binary search for single-symbol lookup	2026-04-15 15:57:11 -07:00
Linus Torvalds	f5ad410100	bpf-next-7.1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmndDWsACgkQ6rmadz2v bTr/jw//WQ+IowvstytntSbZFhSSKjwUP1J0oz/wAyKxvly+sBQADBQkljqNaEju Kq48CPWftJXG45x3O5P4GSYOuBnd9nwDS/hM6jA9f3Ok4IEOHAHCxLot0uq52iJa ieGeJTUEGKFUUEiTuImt/0+Y3aeRQFV0f484+WcmCpdm+cqIXxRnxsMMFuovM4Uj VUgYaooZteaOcnhZpaX/4bWiXM7x7FibLu9gPu9fyyHJIiVrJD+sMhb/UZtsODZO gywy9GNs93Xm9ZoRSTpWA4pAvRajqa8DEtLlV8fx4LpvYdHIjdByiTR9CeKHYxrB vcV1Ty6dGTd6ifFtW6ul1qaF9KeZXQBHxCTmhj4ITek1TMNDfJJD+Iwgc1ll9RL4 RoZ8DJC8Qp2RDH+3b/ptBgfROw1nrwQLuw5cG7mj5mhQdu/z9AMI2ifPk9wv56Zj OV6wRnDcwFu5SLBUNCMd/ypnigKdWcSHCNvWo2HTtcy771b/fqz60K8dMcIWKH5B 3qvXEBHbSdf48D6t64nOyVuo8RKSIizER5Mj/baabcJqZKoAtVUo2l2vd63hX/OD v/y51NvI0lH6cOMLka3LHVIVJInOFSKgOUa1aaKQ0KDjQDRRmmy8yY9h6RZ+aHWb 78K7oCNRx/SCLdslYFGSTQdbiI4/JVoDc6cWtHy413m5+L1447A= =k6te -----END PGP SIGNATURE----- Merge tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Welcome new BPF maintainers: Kumar Kartikeya Dwivedi, Eduard Zingerman while Martin KaFai Lau reduced his load to Reviwer. - Lots of fixes everywhere from many first time contributors. Thank you All. - Diff stat is dominated by mechanical split of verifier.c into multiple components: - backtrack.c: backtracking logic and jump history - states.c: state equivalence - cfg.c: control flow graph, postorder, strongly connected components - liveness.c: register and stack liveness - fixups.c: post-verification passes: instruction patching, dead code removal, bpf_loop inlining, finalize fastcall 8k line were moved. verifier.c still stands at 20k lines. Further refactoring is planned for the next release. - Replace dynamic stack liveness with static stack liveness based on data flow analysis. This improved the verification time by 2x for some programs and equally reduced memory consumption. New logic is in liveness.c and supported by constant folding in const_fold.c (Eduard Zingerman, Alexei Starovoitov) - Introduce BTF layout to ease addition of new BTF kinds (Alan Maguire) - Use kmalloc_nolock() universally in BPF local storage (Amery Hung) - Fix several bugs in linked registers delta tracking (Daniel Borkmann) - Improve verifier support of arena pointers (Emil Tsalapatis) - Improve verifier tracking of register bounds in min/max and tnum domains (Harishankar Vishwanathan, Paul Chaignon, Hao Sun) - Further extend support for implicit arguments in the verifier (Ihor Solodrai) - Add support for nop,nop5 instruction combo for USDT probes in libbpf (Jiri Olsa) - Support merging multiple module BTFs (Josef Bacik) - Extend applicability of bpf_kptr_xchg (Kaitao Cheng) - Retire rcu_trace_implies_rcu_gp() (Kumar Kartikeya Dwivedi) - Support variable offset context access for 'syscall' programs (Kumar Kartikeya Dwivedi) - Migrate bpf_task_work and dynptr to kmalloc_nolock() (Mykyta Yatsenko) - Fix UAF in in open-coded task_vma iterator (Puranjay Mohan) * tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (241 commits) selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room bpf: reject short IPv4/IPv6 inputs in bpf_prog_test_run_skb selftests/bpf: Use memfd_create instead of shm_open in cgroup_iter_memcg selftests/bpf: Add test for cgroup storage OOB read bpf: Fix OOB in pcpu_init_value selftests/bpf: Fix reg_bounds to match new tnum-based refinement selftests/bpf: Add tests for non-arena/arena operations bpf: Allow instructions with arena source and non-arena dest registers bpftool: add missing fsession to the usage and docs of bpftool docs/bpf: add missing fsession attach type to docs bpf: add missing fsession to the verifier log bpf: Move BTF checking logic into check_btf.c bpf: Move backtracking logic to backtrack.c bpf: Move state equivalence logic to states.c bpf: Move check_cfg() into cfg.c bpf: Move compute_insn_live_regs() into liveness.c bpf: Move fixup/post-processing logic from verifier.c into fixups.c bpf: Simplify do_check_insn() bpf: Move checks for reserved fields out of the main pass bpf: Delete unused variable ...	2026-04-14 18:04:04 -07:00
Linus Torvalds	c1fe867b5b	Updates for the timer/timekeeping core: - A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer. - Better timer locality decision - Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided. - Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale. - Support for clocksource coupled clockevents The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR. As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled. - Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR. With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations. - Robustness improvements and cleanups of historical sins in the hrtimer and timekeeping code. - Rewrite of the clocksource watchdog. The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years this rather naive approach turned out to have major flaws. Long delays between the watchdog invocations can cause wrap arounds of the reference clocksource. The access to the reference clocksource degrades on large multi-sockets systems dure to interconnect congestion. This has been addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. The rewrite addresses this by: - Restricting the validation against the reference clocksource to the boot CPU which is usually closest to the legacy block which contains the reference clocksource (HPET/ACPI-PM). - Do a round robin validation betwen the boot CPU and the other CPUs based only on the TSC with an algorithm similar to the TSC synchronization code during CPU hotplug. - Being more leniant versus remote timeouts - The usual tiny fixes, cleanups and enhancements all over the place -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnbxdMQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoeq6EAC4h9wuBr5yCkxmog1Bhlk9cnK0oX1THb7V Q4z6DrYAiDXP6z4IDwqSW+3vvmNw1QXOeqpyMTiiIcQ5mNSs1IDnCt5HOEwY+ICm fiSUMYkXkH6xdFWspYWFkD7aExHJRT3hd/bo+WnXGHhHclPj5NHZssLMIDboHrzX jLV1hljmthfwg/uOXDGmQUPRFjqr2ZQjo7zGA5SwfVg8Krz7g/qRVy2wUns9TdW/ NYwihDm1YV7qkK/+f1GnMdd70toqb1OZo/fS9FPbBrPLdyi8V+UbnFSUeZu8kCwV KubAzjLZR4xYCnrlaHhoi208GMd0TOvHMOrdAA0zkQHfhmszGl4N0pbF/EI29Ft2 tQG/FUTG+nzgNOrMCPN2nr3u/UOXLP+gO+2hkyyQjqUP35IyaTYQn10JgBmPTdJL Ab6E8WL9gTMCd+t/bVjdU/B8W9ruMihKBtWkTfMBCcQ9uNJFCEGzrcMF8hzFYRTs /4rMDr3NlGYydAnbKPj6bkC5gtjBvh/L08kOdUFyXCMSqIzvJkZJ4241ogl1Awi6 VfdwjF5KZCQo3M1ujpep+1L010wC/yulqLt2brQMO9Nt05dRhgwM3lxy7cnlMNm3 NdfMgi+OG0CzQ+ZUpvo20hCgTDUVgWN9g5R3rar8FJX+Ym3T+ZoEKlShZF+fSRjf YAUIbUyi7A== =2qc8 -----END PGP SIGNATURE----- Merge tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer: - Better timer locality decision - Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided. - Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale. - Support for clocksource coupled clockevents The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR. As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled. - Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR. With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations. - Robustness improvements and cleanups of historical sins in the hrtimer and timekeeping code. - Rewrite of the clocksource watchdog. The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years this rather naive approach turned out to have major flaws. Long delays between the watchdog invocations can cause wrap arounds of the reference clocksource. The access to the reference clocksource degrades on large multi-sockets systems dure to interconnect congestion. This has been addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. The rewrite addresses this by: - Restricting the validation against the reference clocksource to the boot CPU which is usually closest to the legacy block which contains the reference clocksource (HPET/ACPI-PM). - Do a round robin validation betwen the boot CPU and the other CPUs based only on the TSC with an algorithm similar to the TSC synchronization code during CPU hotplug. - Being more leniant versus remote timeouts - The usual tiny fixes, cleanups and enhancements all over the place * tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) alarmtimer: Access timerqueue node under lock in suspend hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check posix-timers: Fix stale function name in comment timers: Get this_cpu once while clearing the idle state clocksource: Rewrite watchdog code completely clocksource: Don't use non-continuous clocksources as watchdog x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly MIPS: Don't select CLOCKSOURCE_WATCHDOG parisc: Remove unused clocksource flags hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node hrtimer: Remove trailing comma after HRTIMER_MAX_CLOCK_BASES hrtimer: Mark index and clockid of clock base as const hrtimer: Drop unnecessary pointer indirection in hrtimer_expire_entry event hrtimer: Drop spurious space in 'enum hrtimer_base_type' hrtimer: Don't zero-initialize ret in hrtimer_nanosleep() hrtimer: Remove hrtimer_get_expires_ns() timekeeping: Mark offsets array as const timekeeping/auxclock: Consistently use raw timekeeper for tk_setup_internals() timer_list: Print offset as signed integer tracing: Use explicit array size instead of sentinel elements in symbol printing ...	2026-04-14 10:27:07 -07:00
Vincent Donnefort	6170922f13	ring-buffer: Prevent off-by-one array access in ring_buffer_desc_page() As pointed out by Smatch, the ring-buffer descriptor array page_va is counted by nr_page_va, but the accessor ring_buffer_desc_page() allows access off by one. Currently, this does not cause problems, as the page ID always comes from a trusted source. Nonetheless, ensure robustness and fix the accessor. While at it, make the page_id unsigned. Link: https://patch.msgid.link/20260410124527.3563970-1-vdonnefort@google.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-14 05:13:09 -04:00
Cao Ruichuang	1111e9bd83	ring-buffer: Report header_page overwrite as char The header_page tracefs metadata currently reports overwrite as an int field with size 1. That makes parsers warn about a type and size mismatch even though the field is only used as a one-byte flag within commit. Keep the shared offset with commit as-is, but report overwrite as char so the declared type matches the hardcoded size. The signedness is already carried separately by the emitted signed field. Link: https://patch.msgid.link/20260406165333.46052-1-create0818@163.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216999 Signed-off-by: Cao Ruichuang <create0818@163.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-14 04:29:55 -04:00
Thomas Gleixner	ff1c0c5d07	Merge branch 'timers/urgent' into timers/core to resolve the conflict with urgent fixes.	2026-04-11 07:58:33 +02:00
Andrey Grodzovsky	1870ddcd94	bpf: Prefer vmlinux symbols over module symbols for unqualified kprobes When an unqualified kprobe target exists in both vmlinux and a loaded module, number_of_same_symbols() returns a count greater than 1, causing kprobe attachment to fail with -EADDRNOTAVAIL even though the vmlinux symbol is unambiguous. When no module qualifier is given and the symbol is found in vmlinux, return the vmlinux-only count without scanning loaded modules. This preserves the existing behavior for all other cases: - Symbol only in a module: vmlinux count is 0, falls through to module scan as before. - Symbol qualified with MOD:SYM: mod != NULL, unchanged path. - Symbol ambiguous within vmlinux itself: count > 1 is returned as-is. Fixes: `926fe783c8` ("tracing/kprobes: Fix symbol counting logic by looking at modules as well") Fixes: `9d8616034f` ("tracing/kprobes: Add symbol counting check when module loads") Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Link: https://lore.kernel.org/r/20260407203912.1787502-2-andrey.grodzovsky@crowdstrike.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-07 16:27:52 -07:00
Pengpeng Hou	4346be6577	tracing/probe: reject non-closed empty immediate strings parse_probe_arg() accepts quoted immediate strings and passes the body after the opening quote to __parse_imm_string(). That helper currently computes strlen(str) and immediately dereferences str[len - 1], which underflows when the body is empty and not closed with double-quotation. Reject empty non-closed immediate strings before checking for the closing quote. Link: https://lore.kernel.org/all/20260401160315.88518-1-pengpeng@iscas.ac.cn/ Fixes: `a42e3c4de9` ("tracing/probe: Add immediate string parameter support") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-06 09:22:42 +09:00
Steven Rostedt	3515572dd0	tracing: Allow backup to save persistent ring buffer before it starts When the persistent ring buffer was first introduced, it did not make sense to start tracing for it on the kernel command line. That's because if there was a crash, the start of events would invalidate the events from the previous boot that had the crash. But now that there's a "backup" instance that can take a snapshot of the persistent ring buffer when boot starts, it is possible to have the persistent ring buffer start events at boot up and not lose the old events. Update the code where the boot events start after all boot time instances are created. This will allow the backup instance to copy the persistent ring buffer from the previous boot, and allow the persistent ring buffer to start tracing new events for the current boot. reserve_mem=100M:12M:trace trace_instance=boot_mapped^@trace,sched trace_instance=backup=boot_mapped The above will create a boot_mapped persistent ring buffer and enabled the scheduler events. If there's a crash, a "backup" instance will be created holding the events of the persistent ring buffer from the previous boot, while the persistent ring buffer will once again start tracing scheduler events of the current boot. Now the user doesn't have to remember to start the persistent ring buffer. It will always have the events started at each boot. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/20260331163924.6ccb3896@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-02 13:29:08 -04:00
Masami Hiramatsu (Google)	eca33fdab4	tracing: Remove the backup instance automatically after read Since the backup instance is readonly, after reading all data via pipe, no data is left on the instance. Thus it can be removed safely after closing all files. This also removes it if user resets the ring buffer manually via 'trace' file. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/177502547711.1311542.12572973358010839400.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-02 13:22:30 -04:00
Masami Hiramatsu (Google)	2c79da099a	tracing: Make the backup instance non-reusable Since there is no reason to reuse the backup instance, make it readonly (but erasable). Note that only backup instances are readonly, because other trace instances will be empty unless it is writable. Only backup instances have copy entries from the original. With this change, most of the trace control files are removed from the backup instance, including eventfs enable/filter etc. # find /sys/kernel/tracing/instances/backup/events/ \| wc -l 4093 # find /sys/kernel/tracing/instances/boot_map/events/ \| wc -l 9573 Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/177502546939.1311542.1826814401724828930.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-02 13:20:38 -04:00
Vincent Donnefort	20ad8b0888	ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a ring_buffer_per_cpu will be allocated and its corresponding bit toggled in the cpumask. Many readers check this cpumask to know if they can safely read the ring_buffer_per_cpu but they are doing so without memory ordering and may observe the cpumask bit set while having NULL buffer pointer. Enforce the memory read ordering by sending an IPI to all online CPUs. The hotplug path is a slow-path anyway and it saves us from adding read barriers in numerous call sites. Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-02 13:19:09 -04:00
Varun R Mallya	eb7024bfcc	bpf: Reject sleepable kprobe_multi programs at attach time kprobe.multi programs run in atomic/RCU context and cannot sleep. However, bpf_kprobe_multi_link_attach() did not validate whether the program being attached had the sleepable flag set, allowing sleepable helpers such as bpf_copy_from_user() to be invoked from a non-sleepable context. This causes a "sleeping function called from invalid context" splat: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 0 Fix this by rejecting sleepable programs early in bpf_kprobe_multi_link_attach(), before any further processing. Fixes: `0dcac27254` ("bpf: Add multi kprobe link") Signed-off-by: Varun R Mallya <varunrmallya@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260401191126.440683-1-varunrmallya@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:48:46 -07:00
Andrey Grodzovsky	93e8fd1a56	ftrace: Use kallsyms binary search for single-symbol lookup When ftrace_lookup_symbols() is called with a single symbol (cnt == 1), use kallsyms_lookup_name() for O(log N) binary search instead of the full linear scan via kallsyms_on_each_symbol(). ftrace_lookup_symbols() was designed for batch resolution of many symbols in a single pass. For large cnt this is efficient: a single O(N) walk over all symbols with O(log cnt) binary search into the sorted input array. But for cnt == 1 it still decompresses all ~200K kernel symbols only to match one. kallsyms_lookup_name() uses the sorted kallsyms index and needs only ~17 decompressions for a single lookup. This is the common path for kprobe.session with exact function names, where libbpf sends one symbol per BPF_LINK_CREATE syscall. If binary lookup fails (duplicate symbol names where the first match is not ftrace-instrumented), the function falls through to the existing linear scan path. Before (cnt=1, 50 kprobe.session programs): Attach: 858 ms (kallsyms_expand_symbol 25% of CPU) After: Attach: 52 ms (16x faster) Cc: <bpf@vger.kernel.org> Link: https://patch.msgid.link/20260302200837.317907-3-andrey.grodzovsky@crowdstrike.com Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-04-01 16:58:36 -04:00
Nam Cao	00f0dadde8	rv: Allow epoll in rtapp-sleep monitor Since commit `0c43094f8c` ("eventpoll: Replace rwlock with spinlock"), epoll_wait is real-time-safe syscall for sleeping. Add epoll_wait to the list of rt-safe sleeping APIs. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20260401130828.3115428-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-04-01 15:18:30 +02:00
Gabriele Monaco	b133207deb	rv: Add nomiss deadline monitor Add the deadline monitors collection to validate the deadline scheduler, both for deadline tasks and servers. The currently implemented monitors are: * nomiss: validate dl entities run to completion before their deadiline Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20260330111010.153663-13-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-03-31 16:47:18 +02:00
Gabriele Monaco	2b406fdb33	rv: Convert the opid monitor to a hybrid automaton The opid monitor validates that wakeup and need_resched events only occur with interrupts and preemption disabled by following the preemptirq tracepoints. As reported in [1], those tracepoints might be inaccurate in some situations (e.g. NMIs). Since the monitor doesn't validate other ordering properties, remove the dependency on preemptirq tracepoints and convert the monitor to a hybrid automaton to validate the constraint during event handling. This makes the monitor more robust by also removing the workaround for interrupts missing the preemption tracepoints, which was working on PREEMPT_RT only and allows the monitor to be built on kernels without the preemptirqs tracepoints. [1] - https://lore.kernel.org/lkml/20250625120823.60600-1-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-03-31 16:47:17 +02:00
Gabriele Monaco	13578a0871	rv: Add sample hybrid monitor stall Add a sample monitor to showcase hybrid/timed automata. The stall monitor identifies tasks stalled for longer than a threshold and reacts when that happens. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-03-31 16:47:17 +02:00
Gabriele Monaco	f5587d1b6e	rv: Add Hybrid Automata monitor type Deterministic automata define which events are allowed in every state, but cannot define more sophisticated constraint taking into account the system's environment (e.g. time or other states not producing events). Add the Hybrid Automata monitor type as an extension of Deterministic automata where each state transition is validating a constraint on a finite number of environment variables. Hybrid automata can be used to implement timed automata, where the environment variables are clocks. Also implement the necessary functionality to handle clock constraints (ns or jiffy granularity) on state and events. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-03-31 16:47:16 +02:00
Wesley Atwell	250ab25391	tracing: Drain deferred trigger frees if kthread creation fails Boot-time trigger registration can fail before the trigger-data cleanup kthread exists. Deferring those frees until late init is fine, but the post-boot fallback must still drain the deferred list if kthread creation never succeeds. Otherwise, boot-deferred nodes can accumulate on trigger_data_free_list, later frees fall back to synchronously freeing only the current object, and the older queued entries are leaked forever. To trigger this, add the following to the kernel command line: trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon The second traceon trigger will fail and be freed. This triggers a NULL pointer dereference and crashes the kernel. Keep the deferred boot-time behavior, but when kthread creation fails, drain the whole queued list synchronously. Do the same in the late-init drain path so queued entries are not stranded there either. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com Fixes: `61d445af0a` ("tracing: Add bulk garbage collection of freeing event_trigger_data") Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-28 08:32:44 -04:00
Luo Haiyang	1f98857322	tracing: Fix potential deadlock in cpu hotplug with osnoise The following sequence may leads deadlock in cpu hotplug: task1 task2 task3 ----- ----- ----- mutex_lock(&interface_lock) [CPU GOING OFFLINE] cpus_write_lock(); osnoise_cpu_die(); kthread_stop(task3); wait_for_completion(); osnoise_sleep(); mutex_lock(&interface_lock); cpus_read_lock(); [DEAD LOCK] Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock). Cc: stable@vger.kernel.org Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.tao172@zte.com.cn> Cc: <ran.xiaokai@zte.com.cn> Fixes: `bce29ac9ce` ("trace: Add osnoise tracer") Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-27 15:18:06 -04:00
Steven Rostedt	23d1cfc021	ring-buffer: Show what clock function is used on timestamp errors The testing for tracing was triggering a timestamp count issue that was always off by one. This has been happening for some time but has never been reported by anyone else. It was finally discovered to be an issue with the "uptime" (jiffies) clock that happened to be traced and the internal recursion caused the discrepancy. This would have been much easier to solve if the clock function being used was displayed when the error was detected. Add the clock function to the error output. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-24 22:21:15 -04:00
Steven Rostedt	dc1d9408c9	Merge commit 'f35dbac6942171dc4ce9398d1d216a59224590a9' into trace/ring-buffer/core The commit `f35dbac694` ("ring-buffer: Fix to update per-subbuf entries of persistent ring buffer") was a fix and merged upstream. It is needed for some other work in the ring buffer. The current branch has the remote buffer code that is shared with the Arm64 subsystem and can't be rebased. Merge in the upstream commit to allow continuing of the ring buffer work. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-24 22:19:21 -04:00
Jiri Olsa	50b35c9e50	ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger the slow path for all direct callers to be able to safely modify attached addresses. At the moment we use ops->func_hash for tmp_ops filter, which represents all the systems attachments. It's faster to use just the passed hash filter, which contains only the modified sites and is always a subset of the ops->func_hash. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Menglong Dong <menglong8.dong@gmail.com> Cc: Song Liu <song@kernel.org> Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org Fixes: `e93672f770` ("ftrace: Add update_ftrace_direct_mod function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-21 16:51:04 -04:00
Masami Hiramatsu (Google)	f35dbac694	ring-buffer: Fix to update per-subbuf entries of persistent ring buffer Since the validation loop in rb_meta_validate_events() updates the same cpu_buffer->head_page->entries, the other subbuf entries are not updated. Fix to use head_page to update the entries field, since it is the cursor in this loop. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Fixes: `5f3b6e839f` ("ring-buffer: Validate boot range memory events") Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-21 16:47:28 -04:00
Steven Rostedt	07183aac4a	tracing: Fix trace_marker copy link list updates When the "copy_trace_marker" option is enabled for an instance, anything written into /sys/kernel/tracing/trace_marker is also copied into that instances buffer. When the option is set, that instance's trace_array descriptor is added to the marker_copies link list. This list is protected by RCU, as all iterations uses an RCU protected list traversal. When the instance is deleted, all the flags that were enabled are cleared. This also clears the copy_trace_marker flag and removes the trace_array descriptor from the list. The issue is after the flags are called, a direct call to update_marker_trace() is performed to clear the flag. This function returns true if the state of the flag changed and false otherwise. If it returns true here, synchronize_rcu() is called to make sure all readers see that its removed from the list. But since the flag was already cleared, the state does not change and the synchronization is never called, leaving a possible UAF bug. Move the clearing of all flags below the updating of the copy_trace_marker option which then makes sure the synchronization is performed. Also use the flag for checking the state in update_marker_trace() instead of looking at if the list is empty. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home Fixes: `7b382efd5e` ("tracing: Allow the top level trace_marker to write into another instances") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-21 16:43:53 -04:00
Steven Rostedt	edca33a562	tracing: Fix failure to read user space from system call trace events The system call trace events call trace_user_fault_read() to read the user space part of some system calls. This is done by grabbing a per-cpu buffer, disabling migration, enabling preemption, calling copy_from_user(), disabling preemption, enabling migration and checking if the task was preempted while preemption was enabled. If it was, the buffer is considered corrupted and it tries again. There's a safety mechanism that will fail out of this loop if it fails 100 times (with a warning). That warning message was triggered in some pi_futex stress tests. Enabling the sched_switch trace event and traceoff_on_warning, showed the problem: pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 What happened was the task 1375 was flagged to be migrated. When preemption was enabled, the migration thread woke up to migrate that task, but failed because migration for that task was disabled. This caused the loop to fail to exit because the task scheduled out while trying to read user space. Every time the task enabled preemption the migration thread would schedule in, try to migrate the task, fail and let the task continue. But because the loop would only enable preemption with migration disabled, it would always fail because each time it enabled preemption to read user space, the migration thread would try to migrate it. To solve this, when the loop fails to read user space without being scheduled out, enabled and disable preemption with migration enabled. This will allow the migration task to successfully migrate the task and the next loop should succeed to read user space without being scheduled out. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home Fixes: `64cf7d058a` ("tracing: Have trace_marker use per-cpu data to read user space") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-21 16:42:36 -04:00
Ingo Molnar	f6472b1793	Linux 7.0-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmm3G/UeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZJUH/R0vQ3Vha48QDEic 1NREwaHxAoTFi0i3y7OPPklqrP2V09D1qg4Q6fExYQVTQgV6F2DRjVbyPKrmr4ay BA6aHrUdnFngYHpDlI1b1r7rJiAIN4WFHl7StO70bS+EB+UPsP9cfP3CKXUfKfqT kyHXzUrd5QnjYmlb9rQw1E6rzsRamNtGUtZf7TwDidJYjtm3sPeDHUkjyRy4xkYd UouIu6W7UXoicl38bJAgaWBY5BiYtjN6ktnY4/gcqDeqYd7mTM3Eb1B+OSXgFfip F0OYfJhfWn+63WnPA+1I5jXWC1UrdVXTMK/NTYjhmGlfdmkLcWDlNGtu+qKZbpwj fmF3Kyo= =6nX1 -----END PGP SIGNATURE----- Merge tag 'v7.0-rc4' into timers/core, to resolve conflict Resolve conflict between this change in the upstream kernel: `4c652a4772` ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline") ... and this pending change in timers/core: `0e98eb1481` ("entry: Prepare for deferred hrtimer rearming") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2026-03-21 08:02:36 +01:00
Thomas Weißschuh (Schneider Electric)	754e38d2d1	tracing: Use explicit array size instead of sentinel elements in symbol printing The sentinel value added by the wrapper macros __print_symbolic() et al prevents the callers from adding their own trailing comma. This makes constructing symbol list dynamically based on kconfig values tedious. Drop the sentinel elements, so callers can either specify the trailing comma or not, just like in regular array initializers. Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-2-095357392669@linutronix.de	2026-03-12 12:15:53 +01:00
Vincent Donnefort	a717943d8e	tracing: Check for undefined symbols in simple_ring_buffer The simple_ring_buffer implementation must remain simple enough to be used by the pKVM hypervisor. Prevent the object build if unresolved symbols are found. Link: https://patch.msgid.link/20260309162516.2623589-19-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	635923081c	tracing: load/unload page callbacks for simple_ring_buffer Add load/unload callback used for each admitted page in the ring-buffer. This will be later useful for the pKVM hypervisor which uses a different VA space and need to dynamically map/unmap the ring-buffer pages. Link: https://patch.msgid.link/20260309162516.2623589-18-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	ea908a2b79	tracing: Add a trace remote module for testing Add a module to help testing the tracefs support for trace remotes. This module: * Use simple_ring_buffer to write into a ring-buffer. * Declare a single "selftest" event that can be triggered from user-space. * Register a "test" trace remote. This is intended to be used by trace remote selftests. Link: https://patch.msgid.link/20260309162516.2623589-15-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	34e5b958bd	tracing: Introduce simple_ring_buffer Add a simple implementation of the kernel ring-buffer. This intends to be used later by ring-buffer remotes such as the pKVM hypervisor, hence the need for a cut down version (write only) without any dependency. Link: https://patch.msgid.link/20260309162516.2623589-14-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	93ae1b76ff	ring-buffer: Export buffer_data_page and macros In preparation for allowing the writing of ring-buffer compliant pages outside of ring_buffer.c, move buffer_data_page and timestamps encoding macros into the publicly available ring_buffer_types.h. Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:55 -04:00
Vincent Donnefort	775cb093bc	tracing: Add events/ root files to trace remotes Just like for the kernel events directory, add 'enable', 'header_page' and 'header_event' at the root of the trace remote events/ directory. Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	072529158e	tracing: Add events to trace remotes An event is predefined point in the writer code that allows to log data. Following the same scheme as kernel events, add remote events, described to user-space within the events/ tracefs directory found in the corresponding trace remote. Remote events are expected to be described during the trace remote registration. Add also a .enable_event callback for trace_remote to toggle the event logging, if supported. Link: https://patch.msgid.link/20260309162516.2623589-10-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	bf2ba0f8ca	tracing: Add init callback to trace remotes Add a .init call back so the trace remote callers can add entries to the tracefs directory. Link: https://patch.msgid.link/20260309162516.2623589-9-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	330b0cceb3	tracing: Add non-consuming read to trace remotes Allow reading the trace file for trace remotes. This performs a non-consuming read of the trace buffer. Link: https://patch.msgid.link/20260309162516.2623589-8-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	9af4ab0e11	tracing: Add reset to trace remotes Allow to reset the trace remote buffer by writing to the Tracefs "trace" file. This is similar to the regular Tracefs interface. Link: https://patch.msgid.link/20260309162516.2623589-7-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:54 -04:00
Vincent Donnefort	96e43537af	tracing: Introduce trace remotes A trace remote relies on ring-buffer remotes to read and control compatible tracing buffers, written by entity such as firmware or hypervisor. Add a Tracefs directory remotes/ that contains all instances of trace remotes. Each instance follows the same hierarchy as any other to ease the support by existing user-space tools. This currently does not provide any event support, which will come later. Link: https://patch.msgid.link/20260309162516.2623589-6-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	fbd1743ecb	ring-buffer: Add non-consuming read for ring-buffer remotes Hopefully, the remote will only swap pages on the kernel instruction (via the swap_reader_page() callback). This means we know at what point the ring-buffer geometry has changed. It is therefore possible to rearrange the kernel view of that ring-buffer to allow non-consuming read. Link: https://patch.msgid.link/20260309162516.2623589-5-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	2e67fabd8b	ring-buffer: Introduce ring-buffer remotes Add ring-buffer remotes to support entities outside of the kernel (such as firmware or a hypervisor) that writes events into a ring-buffer using the tracefs format Require a description of the ring-buffer pages (struct trace_buffer_desc) and callbacks (swap_reader_page and reset) to set up the ring-buffer on the kernel side. Expect the remote entity to maintain and update the meta-page. Link: https://patch.msgid.link/20260309162516.2623589-4-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	e682207bf7	ring-buffer: Store bpage pointers into subbuf_ids The subbuf_ids field allows to point to a specific page from the ring-buffer based on its ID. As a preparation or the upcoming ring-buffer remote support, point this array to the buffer_page instead of the buffer_data_page. Link: https://patch.msgid.link/20260309162516.2623589-3-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Vincent Donnefort	7d776a3627	ring-buffer: Add page statistics to the meta-page Add two fields pages_touched and pages_lost to the ring-buffer meta-page. Those fields are useful to get the number of used pages in the ring-buffer. Link: https://patch.msgid.link/20260309162516.2623589-2-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-09 12:33:53 -04:00
Linus Torvalds	8b7f4cd3ac	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmsahoACgkQ6rmadz2v bToD4Q/9Hr2lAELsTRZIENBTYYTfk48Rzrr+rJbZWLfX4nOu9RwhVboTw0gvJr8r QEteYO8+gP5hrIuifcC+vdzW12CpgziBK7/Qj2NB7MGSX0ZCLK0ShQK6TAbNE8uZ xA4qMcJp0BWJpgfU51z4Ur5nMzG0Th2XlY+SGwHYZ/53qRMmP7Tr8R9wPltKuEll J+tS6BE6bFKMQe1CuIYlYfYj9kO3j4yx9S48j1e8x9LXc2/2arfsOIZQ6L11rnxZ YTa9jbcRL0YdLy7d4hMLyPQquF9hiHDrJLJih+YNmbyOCPGD3HP0ib2c2g0ip+qk WlNFLc2K9w0eS02uuAytaxwArdOpDUHG+skWe+dGt95Bk3Xld0hW/JSI5hrohM1E 0YKjuZeGGvTkI5FJ5kk+BWoApy+FveiN1w7VsHZgr1tix5NRZRgJP/5swM+0eSp/ neTzp08fC10gi+GOnUpdF/lI4wEayoGMpo68BTGiM/hP6Qh268wNWgTHaau3Sk/z wfJb90VTcDHk+N8mnbi4970RZ6OdqX2n7aBy8D7hBW31kHhP27zofzYX51CFD8Ln 5NQDU3wX/hUw9tNwoghCNe12//gfDBi6rO/GoEztNFfVVef4QuzsfwKKdxzUMc9a jZvyDyGh87rGLauDWylq4s23vgxgIG/p114uv+eFp3AMkOfT/Dk= =knWl -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix u32/s32 bounds when ranges cross min/max boundary (Eduard Zingerman) - Fix precision backtracking with linked registers (Eduard Zingerman) - Fix linker flags detection for resolve_btfids (Ihor Solodrai) - Fix race in update_ftrace_direct_add/del (Jiri Olsa) - Fix UAF in bpf_trampoline_link_cgroup_shim (Lang Xu) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: resolve_btfids: Fix linker flags detection selftests/bpf: add reproducer for spurious precision propagation through calls bpf: collect only live registers in linked regs Revert "selftests/bpf: Update reg_bound range refinement logic" selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary bpf: Fix u32/s32 bounds when ranges cross min/max boundary bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del	2026-03-07 12:20:37 -08:00
Linus Torvalds	aed0af05a8	tracing fixes for 7.0: - Fix possible NULL pointer dereference in trace_data_alloc() On the error path in trace_data_alloc(), it can call trigger_data_free() with a NULL pointer. This use to be a kfree() but was changed to trigger_data_free() to clean up any partial initialization. The issue is that trigger_data_free() does not expect a NULL pointer. Have trigger_data_free() return safely on NULL pointer. - Fix multiple events on the command line and bootconfig If multiple events are enabled on the command line separately and not grouped, only the last event gets enabled. That is: trace_event=sched_switch trace_event=sched_waking Will only enable sched_waking where as: trace_event=sched_switch,sched_waking Will enable both. The bootconfig makes it even worse as the second way is the more common method. The issue is that a temporary buffer is used to store the events to enable later in boot. Each time the cmdline callback is called, it overwrites what was previously there. Have the callback append the next value (delimited by a comma) if the temporary buffer already has content. - Fix command line trace_buffer_size if >= 2G The logic to allocate the trace buffer uses "int" for the size parameter in the command line code causing overflow issues if more that 2G is specified. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaaxEIRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qn+QAQCM6aJm0ZqDD2dM262M1mQpkU7sW3Dz hZfBpo3YlH55fQEAklsaD96+yKN7PLl1Vh4c0zCelMHZA7kgck/3GqaFAgA= =rn/Z -----END PGP SIGNATURE----- Merge tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix possible NULL pointer dereference in trace_data_alloc() On the trace_data_alloc() error path, it can call trigger_data_free() with a NULL pointer. This used to be a kfree() but was changed to trigger_data_free() to clean up any partial initialization. The issue is that trigger_data_free() does not expect a NULL pointer. Have trigger_data_free() return safely on NULL pointer. - Fix multiple events on the command line and bootconfig If multiple events are enabled on the command line separately and not grouped, only the last event gets enabled. That is: trace_event=sched_switch trace_event=sched_waking will only enable sched_waking whereas: trace_event=sched_switch,sched_waking will enable both. The bootconfig makes it even worse as the second way is the more common method. The issue is that a temporary buffer is used to store the events to enable later in boot. Each time the cmdline callback is called, it overwrites what was previously there. Have the callback append the next value (delimited by a comma) if the temporary buffer already has content. - Fix command line trace_buffer_size if >= 2G The logic to allocate the trace buffer uses "int" for the size parameter in the command line code causing overflow issues if more that 2G is specified. * tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G tracing: Fix enabling multiple events on the kernel command line and bootconfig tracing: Add NULL pointer check to trigger_data_free()	2026-03-07 09:50:54 -08:00
Calvin Owens	d008ba8be8	tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G Some of the sizing logic through tracer_alloc_buffers() uses int internally, causing unexpected behavior if the user passes a value that does not fit in an int (on my x86 machine, the result is uselessly tiny buffers). Fix by plumbing the parameter's real type (unsigned long) through to the ring buffer allocation functions, which already use unsigned long. It has always been possible to create larger ring buffers via the sysfs interface: this only affects the cmdline parameter. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: `73c5162aa3` ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 22:25:53 -05:00
Andrei-Alexandru Tachici	3b1679e086	tracing: Fix enabling multiple events on the kernel command line and bootconfig Multiple events can be enabled on the kernel command line via a comma separator. But if the are specified one at a time, then only the last event is enabled. This is because the event names are saved in a temporary buffer, and each call by the init cmdline code will reset that buffer. This also affects names in the boot config file, as it may call the callback multiple times with an example of: kernel.trace_event = ":mod:rproc_qcom_common", ":mod:qrtr", ":mod:qcom_aoss" Change the cmdline callback function to append a comma and the next value if the temporary buffer already has content. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260302-trace-events-allow-multiple-modules-v1-1-ce4436e37fb8@oss.qualcomm.com Signed-off-by: Andrei-Alexandru Tachici <andrei-alexandru.tachici@oss.qualcomm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 16:54:34 -05:00

1 2 3 4 5 ...

7111 Commits