linux

mirror of https://github.com/torvalds/linux.git synced 2026-07-27 17:47:41 +02:00

Author	SHA1	Message	Date
Thomas Falcon	1221e50b4a	perf tools: Document recent additions to the perf.data file header Add documentation for recently added HEADER_E_MACHINE and HEADER_CLN_SIZE data to the perf.data file. Also fix a typo at the end of the header section. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-06-15 16:54:39 -03:00
James Clark	2540d48595	perf test: Add named_threads workload Add a workload that runs X threads that run a unique function named "named_threads_thread[x]" which performs a multiplication in a loop for Y loops. Each thread sets its name to "thread[x]". This can be used to test that processor trace decoding handles concurrent threads correctly and the correct symbols and thread names are assigned to samples. Signed-off-by: James Clark <james.clark@linaro.org> Cc: Amir Ayupov <aaupov@meta.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@arm.com> Cc: Mike Leach <mike.leach@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paschalis Mpeis <Paschalis.Mpeis@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-06-10 18:55:49 -03:00
James Clark	3fdf30607e	perf test: Add deterministic workload Add a workload that does the same thing every time for testing CPU trace decoding. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Amir Ayupov <aaupov@meta.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Leach <mike.leach@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paschalis Mpeis <Paschalis.Mpeis@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-06-10 18:55:48 -03:00
James Clark	4cb5dd0379	perf test: Add a workload that forces context switches This workload launches two processes that block when reading and writing to each other forcing the other process to be scheduled for each read/write pair. Signed-off-by: James Clark <james.clark@linaro.org> Cc: Amir Ayupov <aaupov@meta.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@arm.com> Cc: Mike Leach <mike.leach@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paschalis Mpeis <Paschalis.Mpeis@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-06-10 18:55:48 -03:00
James Clark	d281fb9dfd	perf test: Add workload-ctl option Add a --workload-ctl=fifo:ctl-fifo[,ack-fifo] option for 'perf test -w'. When set, run_workload() opens the named FIFO, writes enable before invoking the builtin workload, writes disable before returning, and waits for ack responses when an ack FIFO is provided to ensure that the workload doesn't run until the events are enabled. This can be used to limit the scope of the recording to only the workload execution and avoid recording Perf setup and teardown code if Perf record is started with events disabled (-D 1). Assisted-by: Codex:GPT-5.5 Signed-off-by: James Clark <james.clark@linaro.org> Cc: Amir Ayupov <aaupov@meta.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@arm.com> Cc: Mike Leach <mike.leach@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paschalis Mpeis <Paschalis.Mpeis@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-06-10 18:55:48 -03:00
Ravi Bangoria	a31423e67c	perf doc: Document new IBS capabilities in man page Include examples of: o Privilege filter with Fetch and Op PMUs, including swfilt approach on Zen5 and older platforms and hardware assisted filter on Zen6 and newer platforms o Streaming store filter with Op PMU o Fetch latency filter with Fetch PMU Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-05-22 21:32:28 -03:00
Aaron Tomlin	e445b78ffb	perf trace: Introduce --show-cpu option to display cpu id When tracing system-wide workloads or specific events, it is highly valuable to know exactly which CPU executed a specific event. Currently, perf trace output defaults to omitting CPU information. Introduce a new "--show-cpu" command-line option. When provided, this flag extracts the CPU from the perf sample and prints it in a "[000]" format immediately following the timestamp. This mirrors the behaviour of other tracing tools like ftrace and perf script. For example: # perf trace -e sched:sched_switch --max-events 5 --show-cpu 0.000 [002] :0/0 sched:sched_switch(prev_comm: "swapper/2", prev_prio: 120, next_comm: "rcu_preempt", next_pid: 16 (rcu_preempt), next_prio: 120) 0.009 [002] rcu_preempt/16 sched:sched_switch(prev_comm: "rcu_preempt", prev_pid: 16 (rcu_preempt), prev_prio: 120, prev_state: 128, next_comm: "swapper/2", next_prio: 120) 0.033 [002] :0/0 sched:sched_switch(prev_comm: "swapper/2", prev_prio: 120, next_comm: "kworker/u32:48", next_pid: 35840 (kworker/u32:48-), next_prio: 120) 0.041 [002] kworker/u32:48/35840 sched:sched_switch(prev_comm: "kworker/u32:48", prev_pid: 35840 (kworker/u32:48-), prev_prio: 120, prev_state: 128, next_comm: "swapper/2", next_prio: 120) 0.045 [002] :0/0 sched:sched_switch(prev_comm: "swapper/2", prev_prio: 120, next_comm: "kworker/u32:48", next_pid: 35840 (kworker/u32:48-), next_prio: 120) The feature is implemented strictly as an opt-in toggle to prevent cluttering the standard output and to preserve backwards compatibility for scripts parsing the default output format. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Daniel Vacek <neelx@suse.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Ashe <sean@ashe.io> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-05-16 14:38:22 -03:00
Leo Yan	54940f1526	perf report: Update document for SIMD flags Update SIMD architecture and predicate flags. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-04-10 09:52:06 -07:00
Thomas Richter	59f6de4e8f	perf config: Make symbol_conf::addr2line_disable_warn configurable Make symbol_conf::addr2line_disable_warn configurable by reading the perfconfig file. Use section core and addr2line-disable-warn = value. Update documentation. Example: # perf config -l core.addr2line-timeout=5000 core.addr2line-disable-warn=1 # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-04-08 10:28:49 -07:00
Qinxin Xia	74e2dbe7be	perf tools: Add --pmu-filter option for filtering PMUs This patch adds a new --pmu-filter option to perf-stat command to allow filtering events on specific PMUs. This is useful when there are multiple PMUs with same type (e.g. hisi_sicl2_cpa0 and hisi_sicl0_cpa0). [root@localhost tmp]# perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed [root@localhost tmp]# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-03-27 13:58:01 -07:00
Stephen Brennan	e397dd81bc	perf report: Add comm_nodigit sort key The "comm" column allows grouping events by the process command. It is intended to group like programs, despite having different PIDs. But some workloads may adjust their own command, so that a unique identifier (e.g. a PID or some other numeric value) is part of the command name. This destroys the utility of "comm", forcing perf to place each unique process name into its own bucket, which can contribute to a combinatorial explosion of memory use in perf report. Create a less strict version of this column, which ignores digits when comparing command names. Commands whose names are the same (ignoring digits) are sorted into the same histogram buckets, and displayed with the placeholder value "<N>" in the place of digits. For example, hypothetical command names "kworker/1" "kworker/2" "kworker/3" would sort into the same bucket and be represented as "kworker/<N>". Committer testing: $ perf report -s comm,comm_nodigit \| grep -F "<N>" 0.01% CPU 6/TCG CPU <N>/TCG 0.01% kworker/53:2-mm kworker/<N>:<N>-mm 0.01% migration/24 migration/<N> 0.01% kworker/24:1-ev kworker/<N>:<N>-ev 0.01% llvmpipe-8 llvmpipe-<N> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-03-26 15:22:18 -07:00
Changbin Du	f182573e06	perf tools: Add layout support for --symfs option Add support for parsing an optional layout parameter in the --symfs command line option. The format is: --symfs <directory[,layout]> Where layout can be: - 'hierarchy': matches full path (default) - 'flat': only matches base name When debugging symbol files from a copy of the filesystem (e.g., from a container or remote machine), the debug files are often stored in a flat directory structure with only filenames, not the full original paths. In this case, using 'flat' layout allows perf to find debug symbols by matching only the filename rather than the full path. For example, given a binary path like: /build/output/lib/foo.so With 'perf report --symfs /debug/files,flat', perf will look for: /debug/files/foo.so Instead of: /debug/files/build/output/lib/foo.so This is particularly useful when: - Extracting debug files from containers with different directory layouts - Working with build systems that flatten directory structures Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-03-10 23:13:30 -07:00
Namhyung Kim	c1f70c83be	perf bench: Add -t/--threads option to perf bench mem mmap So that it can measure overhead of mmap_lock and/or per-VMA lock contention. $ perf bench mem mmap -f demand -l 1000 -t 1 # Running 'mem/mmap' benchmark: # function 'demand' (Demand loaded mmap()) # Copying 1MB bytes ... 2.786858 GB/sec $ perf bench mem mmap -f demand -l 1000 -t 2 # Running 'mem/mmap' benchmark: # function 'demand' (Demand loaded mmap()) # Copying 1MB bytes ... 1.624468 GB/sec/thread ( +- 0.30% ) $ perf bench mem mmap -f demand -l 1000 -t 3 # Running 'mem/mmap' benchmark: # function 'demand' (Demand loaded mmap()) # Copying 1MB bytes ... 1.493068 GB/sec/thread ( +- 0.15% ) $ perf bench mem mmap -f demand -l 1000 -t 4 # Running 'mem/mmap' benchmark: # function 'demand' (Demand loaded mmap()) # Copying 1MB bytes ... 1.006087 GB/sec/thread ( +- 0.41% ) Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-02-26 10:54:12 -08:00
Linus Torvalds	c7decec2f2	perf tools changes for v7.0: - Introduce 'perf sched stats' tool with record/report/diff workflows using schedstat counters. - Add a faster libdw based addr2line implementation and allow selecting it or its alternatives via 'perf config addr2line.style='. - Data-type profiling fixes and improvements including the ability to select fields using 'perf report''s -F/-fields, e.g.: 'perf report --fields overhead,type' - Add 'perf test' regression tests for Data-type profiling with C and Rust workloads. - Fix srcline printing with inlines in callchains, make sure this has coverage in 'perf test'. - Fix printing of leaf IP in LBR callchains. - Fix display of metrics without sufficient permission in 'perf stat'. - Print all machines in 'perf kvm report -vvv', not just the host. - Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1 code. - Fix 'perf report's histogram entry collapsing with '-F' option. - Use system's cacheline size instead of a hardcoded value in 'perf report'. - Allow filtering conversion by time range in 'perf data'. - Cover conversion to CTF using 'perf data' in 'perf test'. - Address newer glibc const-correctness (-Werror=discarded-qualifiers) issues. - Fixes and improvements for ARM's CoreSight support, simplify ARM SPE event config in 'perf mem', update docs for 'perf c2c' including the ARM events it can be used with. - Build support for generating metrics from arch specific python script, add extra AMD, Intel, ARM64 metrics using it. - Add AMD Zen 6 events and metrics. - Add JSON file with OpenHW Risc-V CVA6 hardware counters. - Add 'perf kvm' stats live testing. - Add more 'perf stat' tests to 'perf test'. - Fix segfault in `perf lock contention -b/--use-bpf` - Fix various 'perf test' cases for s390. - Build system cleanups, bump minimum shellcheck version to 0.7.2 - Support building the capstone based annotation routines as a plugin. - Allow passing extra Clang flags via EXTRA_BPF_FLAGS. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCaZn25QAKCRCyPKLppCJ+ J9MbAQCTKChBwDsqVh5iPr0UAc+mez9LOPJFa580SYH9nmd1jgD+Oqip7xCf514G ZtYPNh+Mz0se0Mcb++syLUEjxvbrQQY= =y2VY -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: - Introduce 'perf sched stats' tool with record/report/diff workflows using schedstat counters - Add a faster libdw based addr2line implementation and allow selecting it or its alternatives via 'perf config addr2line.style=' - Data-type profiling fixes and improvements including the ability to select fields using 'perf report''s -F/-fields, e.g.: 'perf report --fields overhead,type' - Add 'perf test' regression tests for Data-type profiling with C and Rust workloads - Fix srcline printing with inlines in callchains, make sure this has coverage in 'perf test' - Fix printing of leaf IP in LBR callchains - Fix display of metrics without sufficient permission in 'perf stat' - Print all machines in 'perf kvm report -vvv', not just the host - Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1 code - Fix 'perf report's histogram entry collapsing with '-F' option - Use system's cacheline size instead of a hardcoded value in 'perf report' - Allow filtering conversion by time range in 'perf data' - Cover conversion to CTF using 'perf data' in 'perf test' - Address newer glibc const-correctness (-Werror=discarded-qualifiers) issues - Fixes and improvements for ARM's CoreSight support, simplify ARM SPE event config in 'perf mem', update docs for 'perf c2c' including the ARM events it can be used with - Build support for generating metrics from arch specific python script, add extra AMD, Intel, ARM64 metrics using it - Add AMD Zen 6 events and metrics - Add JSON file with OpenHW Risc-V CVA6 hardware counters - Add 'perf kvm' stats live testing - Add more 'perf stat' tests to 'perf test' - Fix segfault in `perf lock contention -b/--use-bpf` - Fix various 'perf test' cases for s390 - Build system cleanups, bump minimum shellcheck version to 0.7.2 - Support building the capstone based annotation routines as a plugin - Allow passing extra Clang flags via EXTRA_BPF_FLAGS * tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits) perf test script: Add python script testing support perf test script: Add perl script testing support perf script: Allow the generated script to be a path perf test: perf data --to-ctf testing perf test: Test pipe mode with data conversion --to-json perf json: Pipe mode --to-ctf support perf json: Pipe mode --to-json support perf check: Add libbabeltrace to the listed features perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing tools build: Fix feature test for rust compiler perf libunwind: Fix calls to thread__e_machine() perf stat: Add no-affinity flag perf evlist: Reduce affinity use and move into iterator, fix no affinity perf evlist: Missing TPEBS close in evlist__close() perf evlist: Special map propagation for tool events that read on 1 CPU perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel Revert "perf tool_pmu: More accurately set the cpus for tool events" tools build: Emit dependencies file for test-rust.bin tools build: Make test-rust.bin be removed by the 'clean' target ...	2026-02-21 10:51:08 -08:00
Ian Rogers	22ca2f7f32	perf script: Allow the generated script to be a path Allow the script generated by "perf script -g <language>" to be a file path and the language determined by the file extension. This is useful in testing so that the generated script file can be written to a test directory. Committer testing: $ perf record ls a.a ls: cannot access 'a.a': No such file or directory [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.003 MB perf.data (7 samples) ] $ perf script -g python generated Python script: perf-script.py $ perf script -g myscript.py generated Python script: myscript.py $ diff -u perf-script.py myscript.py $ tail myscript.py def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict): print(get_dict_as_string(event_fields_dict)) print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}') def print_header(event_name, cpu, secs, nsecs, pid, comm): print("%-20s %5u %05u.%09u %8u %-20s " % \ (event_name, cpu, secs, nsecs, pid, comm), end="") def get_dict_as_string(a_dict, delimiter=' '): return delimiter.join(['%s=%s'%(k,str(v))for k,v in sorted(a_dict.items())]) $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-02-12 17:45:22 -03:00
Ian Rogers	5d1ab659fb	perf stat: Add no-affinity flag Add flag that disables affinity behavior. Using sched_setaffinity() to place a perf thread on a CPU can avoid certain interprocessor interrupts but may introduce a delay due to the scheduling, particularly on loaded machines. Add a command line option to disable the behavior. This behavior is less present in other tools like `perf record`, as it uses a ring buffer and doesn't make repeated system calls. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-02-10 09:35:28 -03:00
Marc Zyngier	d65bf6e317	KVM: arm64: Remove all traces of FEAT_TME FEAT_TME has been dropped from the architecture. Retrospectively. I'm sure someone is crying somewhere, but most of us won't. Clean-up time. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:02:13 +00:00
Swapnil Sapkal	34b0a58eef	perf sched stats: Fixes in man page Fix the incorrect description of the schedstats report. Also fix the spelling errors in man page. Fixes: `800af362d6` ("perf sched stats: Add details in man page") Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anubhav Shelat <ashelat@redhat.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-28 15:18:41 -03:00
Swapnil Sapkal	800af362d6	perf sched stats: Add details in man page Document 'perf sched stats' purpose, usage examples and guide on how to interpret the report data in the perf-sched man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anubhav Shelat <ashelat@redhat.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Blake Jones <blakejones@google.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: David Vernet <void@manifault.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yujie Liu <yujie.liu@intel.com> Cc: Zhongqiu Han <quic_zhonhan@quicinc.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-22 12:46:05 -03:00
Swapnil Sapkal	d40c68a49f	perf header: Support CPU DOMAIN relation info The '/proc/schedstat' file gives info about load balancing statistics within a given domain. It also contains the cpu_mask giving information about the sibling cpus and domain names after schedstat version 17. Storing this information in perf header will help tools like `perf sched stats` for better analysis. Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anubhav Shelat <ashelat@redhat.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Blake Jones <blakejones@google.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: David Vernet <void@manifault.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yujie Liu <yujie.liu@intel.com> Cc: Zhongqiu Han <quic_zhonhan@quicinc.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-21 21:55:09 -03:00
Leo Yan	dc7fb075f7	perf c2c: Update documentation for adding memory event table Users may occasionally need to see which options are applied to memory events. This helps to understand the behavior of "perf c2c" and "perf mem", and provides guidance for configuring memory event options directly. Add a table to track memory events and their corresponding options, and include the Arm SPE events in it. Suggested-by: Al Grant <al.grant@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-20 17:31:28 -03:00
Namhyung Kim	92ea788d2a	perf inject: Add --convert-callchain option There are applications not built with frame pointers, so DWARF is needed to get the stack traces. `perf record --call-graph dwarf` saves the stack and register data for each sample to get the stacktrace offline. But sometimes this data may have sensitive information and we don't want to keep them in the file. This new 'perf inject --convert-callchain' option creates the callchains and discards the stack and register after that. This saves storage space and processing time for the new data file. Of course, users should remove the original data file to not keep sensitive data around. :) The down side is that it cannot handle inlined callchain entries as they all have the same IPs. Maybe we can add an option to 'perf report' to look up inlined functions using DWARF - IIUC it doesn't require stack and register data. This is an example. $ perf record --call-graph dwarf -- perf test -w noploop $ perf report --stdio --no-children --percent-limit=0 > output-prev $ perf inject -i perf.data --convert-callchain -o perf.data.out $ perf report --stdio --no-children --percent-limit=0 -i perf.data.out > output-next $ diff -u output-prev output-next ... 0.23% perf ld-linux-x86-64.so.2 [.] _dl_relocate_object_no_relro \| - ---elf_dynamic_do_Rela (inlined) - _dl_relocate_object_no_relro + ---_dl_relocate_object_no_relro _dl_relocate_object dl_main _dl_sysdep_start - _dl_start_final (inlined) _dl_start _start Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-20 17:18:24 -03:00
Dapeng Mi	d1f9dc6723	perf Documentation: Correct branch stack sampling call-stack option The correct call-stack option for branch stack sampling should be "stack" instead of "call_stack". Correct it. $perf record -e instructions -j call_stack -- sleep 1 unknown branch filter call_stack, check man page Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -j, --branch-filter <branch filter mask> branch stack filter modes Fixes: `955f6def55` ("perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index"") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Xudong Hao <xudong.hao@intel.com> Cc: Zide Chen <zide.chen@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-13 17:29:57 -03:00
Derek Foreman	8e746e95c3	perf data: Allow filtering conversion by time range This adds a feature to allow restricting the range of converted samples with a range string like perf-script and perf-report --time. Committer testing: Put a probe on the ICMP receive path handling broadcast packets: # perf probe icmp_rcv:64 Added new event: probe:icmp_rcv_L64 (on icmp_rcv:64) You can now use it in all perf tools, such as: perf record -e probe:icmp_rcv_L64 -aR sleep 1 # perf record -e probe:icmp_rcv_L64 ping -c 10 -b 127.255.255.255 WARNING: pinging broadcast address PING 127.255.255.255 (127.255.255.255) 56(84) bytes of data. ^C --- 127.255.255.255 ping statistics --- 10 packets transmitted, 0 received, 100% packet loss, time 9217ms [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.034 MB perf.data (10 samples) ] # perf script ping 52785 [009] 5847.300394: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5848.325018: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5849.349007: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5850.372979: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5851.396988: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5852.420954: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5853.444934: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5854.468926: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5855.492914: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5856.516883: probe:icmp_rcv_L64: (ffffffffaadb337e) # Now get some slices using perf script: # perf script --time 40% ping 52785 [009] 5847.300394: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5848.325018: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5849.349007: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5850.372979: probe:icmp_rcv_L64: (ffffffffaadb337e) # perf script --time 40%-60% ping 52785 [009] 5851.396988: probe:icmp_rcv_L64: (ffffffffaadb337e) ping 52785 [009] 5852.420954: probe:icmp_rcv_L64: (ffffffffaadb337e) # And finally use this new feature: # perf data convert --to-json out.json --time 0%-10% [ perf data convert: Converted 'perf.data' into JSON data 'out.json' ] [ perf data convert: Converted and wrote 0.001 MB (1 samples) ] [ perf data convert: Skipped 9 samples ] # cat out.json { "linux-perf-json-version": 1, "headers": { "header-version": 1, "captured-on": "2026-01-06T22:26:40Z", "data-offset": 520, "data-size": 34648, "feat-offset": 35168, "hostname": "number", "os-release": "6.17.12-300.fc43.x86_64", "arch": "x86_64", "cpu-desc": "AMD Ryzen 9 9950X3D 16-Core Processor", "cpuid": "AuthenticAMD,26,68,0", "nrcpus-online": 32, "nrcpus-avail": 32, "perf-version": "6.19.rc4.gf4c270685d3d", "cmdline": [ "/home/acme/bin/perf" ] }, "samples": [ { "timestamp": 5847300394661, "pid": 52785, "tid": 52785, "cpu": 9, "comm": "ping", "callchain": [ { "ip": "0xffffffffaadb337f", "symbol": "icmp_rcv", "dso": "[kernel.kallsyms]" } ], "__probe_ip": "ffffffffaadb337e" } ] } # Signed-off-by: Derek Foreman <derek.foreman@collabora.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-01-06 19:20:02 -03:00
Namhyung Kim	bdd051e249	perf record: Split --data-mmap option Currently -d/--data option controls both PERF_SAMPLE_ADDR bit and perf_event_attr.mmap_data flag. Separate them using new --data-mmap option to support recording only one of them. For data-type profiling, data MMAP is unnecessary but it wastes a lot of space in the ring buffer and data file. Committer testing: On an idle system: root@x1:~# perf record -d -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 5.672 MB perf.data (1075 samples) ] root@x1:~# ls -la perf.data -rw-------. 1 root root 5982480 Dec 16 15:34 perf.data root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 144, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1 root@x1:~# Now with just --data-mmap we will not save that much, as only DATA_SRC will not be enabled in sample_type: root@x1:~# perf record --data-mmap -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 5.576 MB perf.data (716 samples) ] root@x1:~# ls -la perf.data -rw-------. 1 root root 5880112 Dec 16 15:37 perf.data root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 144, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CPU\|IDENTIFIER, read_format: ID\|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1 root@x1:~# To complete, just with DATA_SRC, no mmap_data: root@x1:~# perf record --sample-mem-info -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.407 MB perf.data (1311 samples) ] root@x1:~# ls -la perf.data -rw-------. 1 root root 1509224 Dec 16 15:40 perf.data root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 144, config: 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, disabled: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 144, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CPU\|IDENTIFIER\|DATA_SRC, read_format: ID\|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1 root@x1:~# Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-12-17 09:30:37 -03:00
Ian Rogers	830f1854c4	perf timechart: Add record support for output perf.data path The '-o' option exists for the SVG creation but not for `perf timechart record`. Add to better allow testing. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:23 -08:00
Namhyung Kim	9b4525fd08	perf tools: Merge deferred user callchains Save samples with deferred callchains in a separate list and deliver them after merging the user callchains. If users don't want to merge they can set tool->merge_deferred_callchains to false to prevent the behavior. With previous result, now perf script will show the merged callchains. $ perf script ... pwd 2312 121.163435: 249113 cpu/cycles/P: ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ... The old output can be get using --no-merge-callchain option. Also perf report can get the user callchain entry at the end. $ perf report --no-children --stdio -q -S __build_id_parse.isra.0 # symbol: __build_id_parse.isra.0 8.40% pwd [kernel.kallsyms] \| ---__build_id_parse.isra.0 perf_event_mmap mprotect_fixup do_mprotect_pkey __x64_sys_mprotect do_syscall_64 entry_SYSCALL_64_after_hwframe mprotect _dl_sysdep_start _dl_start_user Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	27ddc1d7a6	perf record: Add --call-graph fp,defer option for deferred callchains Add a new callchain record mode option for deferred callchains. For now it only works with FP (frame-pointer) mode. And add the missing feature detection logic to clear the flag on old kernels. $ perf record --call-graph fp,defer -vv true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|CALLCHAIN\|PERIOD read_format ID\|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 defer_callchain 1 defer_output 1 ------------------------------------------------------------ sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off deferred callchain support Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:13 -08:00
James Clark	5accdaec52	perf docs: arm-spe: Document new SPE filtering features FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes so document them. Also document existing 'event_filter' bits that were missing from the doc and the fact that latency values are stored in the weight field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-24 12:20:07 -08:00
Ian Rogers	754187ad73	perf build: Remove NO_AUXTRACE build option The NO_AUXTRACE build option was used when the __get_cpuid feature test failed or if it was provided on the command line. The option no longer avoids a dependency on a library and so having the option is just adding complexity to the code base. Remove the option CONFIG_AUXTRACE from Build files and HAVE_AUXTRACE_SUPPORT by assuming it is always defined. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Tianyou Li	cd3466cd26	perf c2c: Add annotation support to perf c2c report Perf c2c report currently specified the code address and source:line information in the cacheline browser, while it is lack of annotation support like perf report to directly show the disassembly code for the particular symbol shared that same cacheline. This patches add a key 'a' binding to the cacheline browser which reuse the annotation browser to show the disassembly view for easier analysis of cacheline contentions. Signed-off-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Pan Deng <pan.deng@intel.com> Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com> Reviewed-by: Wangyang Guo <wangyang.guo@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-19 12:49:59 +09:00
Leo Yan	0a75ba3e84	perf docs: Document building with Clang Add example commands for building perf with Clang. Since recent Android NDK releases use Clang as the default compiler, a separate Android specific document is no longer needed; point to the general build documentation instead. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-9-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-10-06 16:49:25 -03:00
Ian Rogers	e444c2d4a2	perf check: Add libLLVM feature Advertise when perf is built with the HAVE_LIBLLVM_SUPPORT option. Committer testing: $ perf -vv \| grep LLVM libLLVM: [ on ] # HAVE_LIBLLVM_SUPPORT $ And the form to use in scripts, notably the tools/perf/tests/shell/ 'perf test' ones: $ perf check feature libllvm libLLVM: [ on ] # HAVE_LIBLLVM_SUPPORT $ perf check -q feature libllvm && echo LLVM is present LLVM is present $ perf check -q feature liballvm && echo ALLVM is present $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Bill Wendling <morbo@google.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Haibo Xu <haibo1.xu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-10-06 15:24:13 -03:00
Thomas Falcon	6b9c0261b3	perf record: Add ratio-to-prev term Provide ratio-to-prev term which allows the user to set the event sample period of two events corresponding to a desired ratio. If using on an Intel x86 platform with Auto Counter Reload support, also set corresponding event's config2 attribute with a bitmask which counters to reset and which counters to sample if the desired ratio is met or exceeded. On other platforms, only the sample period is affected by the ratio-to-prev term. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-10-03 16:49:51 -03:00
Markus Heidelberg	a93b9ccb03	perf tools: Fix duplicated words in documentation and comments - "the the" - "in in" - "a a" Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-10-01 09:44:02 -03:00
Ankur Arora	a8f0992998	perf bench mem: Add mmap() workloads Add two mmap() workloads: one that eagerly populates a region and another that demand faults it in. The intent is to probe the memory subsytem performance incurred by mmap(). $ perf bench mem mmap -s 4gb -p 4kb -l 10 -f populate # Running 'mem/mmap' benchmark: # function 'populate' (Eagerly populated map()) # Copying 4gb bytes ... 1.811691 GB/sec $ perf bench mem mmap -s 4gb -p 2mb -l 10 -f populate # Running 'mem/mmap' benchmark: # function 'populate' (Eagerly populated mmap()) # Copying 4gb bytes ... 12.272017 GB/sec $ perf bench mem mmap -s 4gb -p 1gb -l 10 -f populate # Running 'mem/mmap' benchmark: # function 'populate' (Eagerly populated mmap()) # Copying 4gb bytes ... 17.085927 GB/sec Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-19 12:43:59 -03:00
Ankur Arora	fd1d882c4c	perf bench mem: Allow chunking on a memory region There can be a significant gap in memset/memcpy performance depending on the size of the region being operated on. With chunk-size=4kb: $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled $ perf bench mem memset -p 4kb -k 4kb -s 4gb -l 10 -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4gb bytes ... 13.011655 GB/sec With chunk-size=1gb: $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled $ perf bench mem memset -p 4kb -k 1gb -s 4gb -l 10 -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4gb bytes ... 21.936355 GB/sec So, allow the user to specify the chunk-size. The default value is identical to the total size of the region, which preserves current behaviour. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-19 12:43:38 -03:00
Ankur Arora	7b6837e63a	perf bench mem: Allow mapping of hugepages Page sizes that can be selected: 4KB, 2MB, 1GB. Both the reservation and node from which hugepages are allocated from are expected to be addressed by the user. An example of page-size selection: $ perf bench mem memset -s 4gb -p 2mb # Running 'mem/memset' benchmark: # function 'default' (Default memset() provided by glibc) # Copying 4gb bytes ... 14.919194 GB/sec # function 'x86-64-unrolled' (unrolled memset() in arch/x86/lib/memset_64.S) # Copying 4gb bytes ... 11.514503 GB/sec # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4gb bytes ... 12.600568 GB/sec Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-19 12:43:26 -03:00
Thomas Richter	817af72c05	perf tools: Update header documentation on BPF_PROG_INFO Update the perf.data file format description on header section HEADER_BPF_PROG_INFO. The information is taken from process_bpf_prog_info() and write_bpf_prog_info() from file util/header.c. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-19 12:14:30 -03:00
Namhyung Kim	ece3c7754f	perf trace: Add --max-summary option The --max-summary option is to limit the number of output lines for syscall summary stats. The max applies to each entries like thread and cgroups. For total summary, it will just print up to the given number. For example, $ sudo perf trace -as --max-summary 3 sleep 0.1 ThreadPoolServi (1011651), 114 events, 14.8% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 38 0 95.589 0.000 2.515 11.153 28.98% futex 9 0 0.040 0.002 0.004 0.014 28.63% read 10 0 0.037 0.003 0.004 0.005 4.67% sleep (1050529), 250 events, 32.4% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ clock_nanosleep 1 0 100.156 100.156 100.156 100.156 0.00% execve 4 3 1.020 0.005 0.255 0.989 95.93% openat 36 17 0.416 0.003 0.012 0.029 10.58% ... And this is for per-cgroup summary using BPF. $ sudo perf trace -as --max-summary 3 --summary-mode=cgroup --bpf-summary sleep 0.1 cgroup /user.slice/user-657345.slice/user@657345.service/session.slice/org.gnome.Shell@x11.service, 12 events syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ recvmsg 8 7 0.016 0.001 0.002 0.006 39.73% ppoll 1 0 0.014 0.014 0.014 0.014 0.00% write 2 0 0.010 0.002 0.005 0.008 61.02% cgroup /user.slice/user-657345.slice/session-4.scope, 73 events syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 8 0 13.461 0.010 1.683 12.235 89.66% ioctl 20 0 0.204 0.001 0.010 0.113 54.01% writev 11 0 0.164 0.004 0.015 0.042 20.34% Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-19 12:14:29 -03:00
Ian Rogers	035c178930	perf parse-events: Add 'X' modifier to exclude an event from being regrouped The function parse_events__sort_events_and_fix_groups is needed to fix uncore events like: ``` $ perf stat -e '{data_read,data_write}' ... ``` so that the multiple uncore PMUs have a group each of data_read and data_write events. The same function will perform architecture sorting and group fixing, in particular for Intel topdown/perf-metric events. Grouping multiple perf metric events together causes perf_event_open to fail as the group can only support one. This means command lines like: ``` $ perf stat -e 'slots,slots' ... ``` fail as the slots events are forced into a group together to try to satisfy the perf-metric event constraints. As the user may know better than parse_events__sort_events_and_fix_groups add a 'X' modifier to skip its regrouping behavior. This allows the following to succeed rather than fail on the second slots event being opened: ``` $ perf stat -e 'slots,slots:X' -a sleep 1 Performance counter stats for 'system wide': 6,834,154,071 cpu_core/slots/ (50.13%) 5,548,629,453 cpu_core/slots/X (49.87%) 1.002634606 seconds time elapsed ``` Closes: https://lore.kernel.org/lkml/20250822082233.1850417-1-dapeng1.mi@linux.intel.com/ Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reported-by: Xudong Hao <xudong.hao@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-12 15:53:32 -03:00
James Clark	edf93f2a24	perf docs: Update SPE doc to include default instructions group The instructions group is now generated by default so update the doc to reflect this. Also explain the period/downsampling mechanism in more detail. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <Ben.Gainey@arm.com> Cc: George Wort <George.Wort@arm.com> Cc: Graham Woodward <Graham.Woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Williams <Michael.Williams@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-09-09 14:51:33 -03:00
Namhyung Kim	7dbe89ca3d	perf annotate: Add --code-with-type support for TUI Until now, the --code-with-type option is available only on stdio. But it was an artifical limitation because of an implemention issue. Implement the same logic in annotation_line__write() for stdio2/TUI and remove the limitation and update the man page. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250816031635.25318-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-08-28 12:33:08 -03:00
Ian Rogers	53b00ff358	perf record: Make --buildid-mmap the default Support for build IDs in mmap2 perf events has been present since Linux v5.12: https://lore.kernel.org/lkml/20210219194619.1780437-1-acme@kernel.org/ Build ID mmap events don't avoid the need to inject build IDs for DSO touched by samples as the build ID cache is populated by perf record. They can avoid some cases of symbol mis-resolution caused by the file system changing from when a sample occurred and when the DSO is sought. Unlike the --buildid-mmap option, this chnage doesn't disable the build ID cache but it does disable the processing of samples looking for DSOs to inject build IDs for. To disable the build ID cache the -B (--no-buildid) option should be used. Making this option the default was raised on the list in: https://lore.kernel.org/linux-perf-users/CAP-5=fXP7jN_QrGUcd55_QH5J-Y-FCaJ6=NaHVtyx0oyNh8_-Q@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-25 10:37:56 -07:00
Ian Rogers	bd741d80dc	perf parse-events: Allow the cpu term to be a PMU or CPU range On hybrid systems, events like msr/tsc/ will aggregate counts across all CPUs. Often metrics only want a value like msr/tsc/ for the cores on which the metric is being computed. Listing each CPU with terms cpu=0,cpu=1.. is laborious and would need to be encoded for all variations of a CPU model. Allow the cpumask from a PMU to be an argument to the cpu term. For example in the following the cpumask of the cstate_pkg PMU selects the CPUs to count msr/tsc/ counter upon: ``` $ cat /sys/bus/event_source/devices/cstate_pkg/cpumask 0 $ perf stat -A -e 'msr/tsc,cpu=cstate_pkg/' -a sleep 0.1 Performance counter stats for 'system wide': CPU0 252,621,253 msr/tsc,cpu=cstate_pkg/ 0.101184092 seconds time elapsed ``` As the cpu term is now also allowed to be a string, allow it to encode a range of CPUs (a list can't be supported as ',' is already a special token). The "event qualifiers" section of the `perf list` man page is updated to detail the additional behavior. The man page formatting is tidied up in this section, as it was incorrectly appearing within the "parameterized events" section. Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250719030517.1990983-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-24 13:41:35 -07:00
Changbin Du	129f70bd60	perf: ftrace: add graph tracer options args/retval/retval-hex/retaddr This change adds support for new funcgraph tracer options funcgraph-args, funcgraph-retval, funcgraph-retval-hex and funcgraph-retaddr. The new added options are: - args : Show function arguments. - retval : Show function return value. - retval-hex : Show function return value in hexadecimal format. - retaddr : Show function return address. # ./perf ftrace -G vfs_write --graph-opts retval,retaddr # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| 5) \| mutex_unlock() { /* <-rb_simple_write+0xda/0x150 / 5) 0.188 us \| local_clock(); / <-lock_release+0x2ad/0x440 ret=0x3bf2a3cf90e / 5) \| rt_mutex_slowunlock() { / <-rb_simple_write+0xda/0x150 / 5) \| _raw_spin_lock_irqsave() { / <-rt_mutex_slowunlock+0x4f/0x200 / 5) 0.123 us \| preempt_count_add(); / <-_raw_spin_lock_irqsave+0x23/0x90 ret=0x0 / 5) 0.128 us \| local_clock(); / <-__lock_acquire.isra.0+0x17a/0x740 ret=0x3bf2a3cfc8b / 5) 0.086 us \| do_raw_spin_trylock(); / <-_raw_spin_lock_irqsave+0x4a/0x90 ret=0x1 / 5) 0.845 us \| } / _raw_spin_lock_irqsave ret=0x292 / 5) \| _raw_spin_unlock_irqrestore() { / <-rt_mutex_slowunlock+0x191/0x200 / 5) 0.097 us \| local_clock(); / <-lock_release+0x2ad/0x440 ret=0x3bf2a3cff1f / 5) 0.086 us \| do_raw_spin_unlock(); / <-_raw_spin_unlock_irqrestore+0x23/0x60 ret=0x1 / 5) 0.104 us \| preempt_count_sub(); / <-_raw_spin_unlock_irqrestore+0x35/0x60 ret=0x0 / 5) 0.726 us \| } / _raw_spin_unlock_irqrestore ret=0x80000000 / 5) 1.881 us \| } / rt_mutex_slowunlock ret=0x0 / 5) 2.931 us \| } / mutex_unlock ret=0x0 */ Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250613114048.132336-1-changbin.du@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-22 17:47:22 -07:00
Namhyung Kim	8db1d77248	perf ftrace latency: Add -e option to measure time between two events In addition to the function latency, it can measure events latencies. Some kernel tracepoints are paired and it's menningful to measure how long it takes between the two events. The latency is tracked for the same thread. Currently it only uses BPF to do the work but it can be lifted later. Instead of having separate a BPF program for each tracepoint, it only uses generic 'event_begin' and 'event_end' programs to attach to any (raw) tracepoints. $ sudo perf ftrace latency -a -b --hide-empty \ -e i915_request_wait_begin,i915_request_wait_end -- sleep 1 # DURATION \| COUNT \| GRAPH \| 256 - 512 us \| 4 \| ###### \| 2 - 4 ms \| 2 \| ### \| 4 - 8 ms \| 12 \| ################### \| 8 - 16 ms \| 10 \| ################ \| # statistics (in usec) total time: 194915 avg time: 6961 max time: 12855 min time: 373 count: 28 Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250714052143.342851-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-14 22:51:58 -07:00
Chun-Tse Shao	aa497357c1	perf stat: Fix uncore aggregation number Follow up: lore.kernel.org/CAP-5=fVDF4-qYL1Lm7efgiHk7X=_nw_nEFMBZFMcsnOOJgX4Kg@mail.gmail.com/ The patch adds unit aggregation during evsel merge the aggregated uncore counters. Change the name of the column to `ctrs` and `counters` for json mode. Tested on a 2-socket machine with SNC3, uncore_imc_[0-11] and cpumask="0,120" Before: perf stat -e clockticks -I 1000 --per-socket # time socket cpus counts unit events 1.001085024 S0 1 9615386315 clockticks 1.001085024 S1 1 9614287448 clockticks perf stat -e clockticks -I 1000 --per-node # time node cpus counts unit events 1.001029867 N0 1 3205726984 clockticks 1.001029867 N1 1 3205444421 clockticks 1.001029867 N2 1 3205234018 clockticks 1.001029867 N3 1 3205224660 clockticks 1.001029867 N4 1 3205207213 clockticks 1.001029867 N5 1 3205528246 clockticks After: perf stat -e clockticks -I 1000 --per-socket # time socket ctrs counts unit events 1.001026071 S0 12 9619677996 clockticks 1.001026071 S1 12 9618612614 clockticks perf stat -e clockticks -I 1000 --per-node # time node ctrs counts unit events 1.001027449 N0 4 3207251859 clockticks 1.001027449 N1 4 3207315930 clockticks 1.001027449 N2 4 3206981828 clockticks 1.001027449 N3 4 3206566126 clockticks 1.001027449 N4 4 3206032609 clockticks 1.001027449 N5 4 3205651355 clockticks Tested with JSON output linter: perf test "perf stat JSON output linter" 94: perf stat JSON output linter : Ok Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Link: https://lore.kernel.org/r/20250627201818.479421-1-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-27 16:14:10 -07:00
Yuzhuo Jing	8e63fd1e00	tools: Remove libcrypto dependency Remove all occurrence of libcrypto in the build system. Signed-off-by: Yuzhuo Jing <yuzhuo@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250625202311.23244-5-ebiggers@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-26 10:51:41 -07:00
Namhyung Kim	c833e8cc4d	Linux 6.16-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmhYZ9AeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGPdcIAIR01BnZbU7knvkI QSMlJR4zl8IDIu+9P35v5jmoklJFqYtIGQUnNTxrWK0i/NGj6+FAmA8xgAzeJZ7d Kv0Qs0fkgwQWxFbbM9Xg9Vob8gxSZ6Qyo5ETz8UDupCNWyUPpMaRDLv8IRtF/19I YAgXJqCXC4LuJJNROSkk3wiLqg+CerzJ/m7GJtBQdCmbv6h87HETaeQpKVgEwTkR uEn99CDnOryovIYWoDaBbqLRF5AQr2JA6lqmQAr57wyjzuj1Ul2BNgLC/dlnuSQl G8kv0+Kf+l1X3UJy1w8V4lkSLyjpTko7QIgv3AVDLDvOobCtZglWD4vPJ1OGvw4m cHDrPg4= =thw2 -----END PGP SIGNATURE----- Merge tag 'v6.16-rc3' into perf-tools-next To get the fixes in libbpf and perf tools. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-22 21:54:03 -07:00

1 2 3 4 5 ...

1318 Commits