mirror of
https://github.com/torvalds/linux.git
synced 2026-05-12 16:18:45 +02:00
Add support for parsing an optional layout parameter in the --symfs command line option. The format is: --symfs <directory[,layout]> Where layout can be: - 'hierarchy': matches full path (default) - 'flat': only matches base name When debugging symbol files from a copy of the filesystem (e.g., from a container or remote machine), the debug files are often stored in a flat directory structure with only filenames, not the full original paths. In this case, using 'flat' layout allows perf to find debug symbols by matching only the filename rather than the full path. For example, given a binary path like: /build/output/lib/foo.so With 'perf report --symfs /debug/files,flat', perf will look for: /debug/files/foo.so Instead of: /debug/files/build/output/lib/foo.so This is particularly useful when: - Extracting debug files from containers with different directory layouts - Working with build systems that flatten directory structures Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
69 lines
5.4 KiB
Plaintext
69 lines
5.4 KiB
Plaintext
For a higher level overview, try: perf report --sort comm,dso
|
|
Sample related events with: perf record -e '{cycles,instructions}:S'
|
|
Compare performance results with: perf diff [<old file> <new file>]
|
|
Boolean options have negative forms, e.g.: perf report --no-children
|
|
To not accumulate CPU time of children symbols add --no-children
|
|
Customize output of perf script with: perf script -F event,ip,sym
|
|
Generate a script for your data: perf script -g <lang>
|
|
Save output of perf stat using: perf stat record <target workload>
|
|
Create an archive with symtabs to analyse on other machine: perf archive
|
|
Search options using a keyword: perf report -h <keyword>
|
|
Use parent filter to see specific call path: perf report -p <regex>
|
|
List events using substring match: perf list <keyword>
|
|
To see list of saved events and attributes: perf evlist -v
|
|
Use --symfs <dir>[,layout] if your symbol files are in non-standard locations.
|
|
To see callchains in a more compact form: perf report -g folded
|
|
To see call chains by final symbol taking CPU time (bottom up) use perf report -G
|
|
Show individual samples with: perf script
|
|
Limit to show entries above 5% only: perf report --percent-limit 5
|
|
Profiling branch (mis)predictions with: perf record -b / perf report
|
|
To show assembler sample context control flow use perf record -b / perf report --samples 10 and then browse context
|
|
To adjust path to source files to local file system use perf report --prefix=... --prefix-strip=...
|
|
Treat branches as callchains: perf record -b ... ; perf report --branch-history
|
|
Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles
|
|
To count events every 1000 msec: perf stat -I 1000
|
|
Print event counts in machine readable CSV format with: perf stat -x\;
|
|
If you have debuginfo enabled, try: perf report -s sym,srcline
|
|
For memory address profiling, try: perf mem record / perf mem report
|
|
For tracepoint events, try: perf report -s trace_fields
|
|
To record callchains for each sample: perf record -g
|
|
If call chains don't work try perf record --call-graph dwarf or --call-graph lbr
|
|
To record every process run by a user: perf record -u <user>
|
|
To show inline functions in call traces add --inline to perf report
|
|
To not record events from perf itself add --exclude-perf
|
|
Skip collecting build-id when recording: perf record -B
|
|
To change sampling frequency to 100 Hz: perf record -F 100
|
|
To show information about system the samples were collected on use perf report --header
|
|
To only collect call graph on one event use perf record -e cpu/cpu-cycles,callgraph=1/,branches ; perf report --show-ref-call-graph
|
|
To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...
|
|
To group events which need to be collected together for accuracy use {}: perf record -e {cycles,branches}' ...
|
|
To compute metrics for samples use perf record -e '{cycles,instructions}' ... ; perf script -F +metric
|
|
See assembly instructions with percentage: perf annotate <symbol>
|
|
If you prefer Intel style assembly, try: perf annotate -M intel
|
|
When collecting LBR backtraces use --stitch-lbr to handle more than 32 deep entries: perf record --call-graph lbr ; perf report --stitch-lbr
|
|
For hierarchical output, try: perf report --hierarchy
|
|
Order by the overhead of source file name and line number: perf report -s srcline
|
|
System-wide collection from all CPUs: perf record -a
|
|
Show current config key-value pairs: perf config --list
|
|
To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed)
|
|
To trace calls using Processor Trace use perf record -e intel_pt// ... ; perf script --call-trace. Then use perf script --time A-B --insn-trace to look at region of interest.
|
|
To measure approximate function latency with Processor Trace use perf record -e intel_pt// ... ; perf script --call-ret-trace
|
|
To trace only single function with Processor Trace use perf record --filter 'filter func @ program' -e intel_pt//u ./program ; perf script --insn-trace
|
|
Show user configuration overrides: perf config --user --list
|
|
To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node`
|
|
To analyze cache line scalability issues use perf c2c record ... ; perf c2c report
|
|
To browse sample contexts use perf report --sample 10 and select in context menu
|
|
To separate samples by time use perf report --sort time,overhead,sym
|
|
To filter subset of samples with report or script add --time X-Y or --cpu A,B,C or --socket-filter ...
|
|
To set sample time separation other than 100ms with --sort time use --time-quantum
|
|
Add -I to perf record to sample register values, which will be visible in perf report sample context.
|
|
To show IPC for sampling periods use perf record -e '{cycles,instructions}:S' and then browse context
|
|
To show context switches in perf report sample context add --switch-events to perf record.
|
|
To show time in nanoseconds in record/report add --ns
|
|
To compare hot regions in two workloads use perf record -b -o file ... ; perf diff --stream file1 file2
|
|
To compare scalability of two workload samples use perf diff -c ratio file1 file2
|
|
For latency profiling, try: perf record/report --latency
|
|
For parallelism histogram, try: perf report --hierarchy --sort latency,parallelism,comm,symbol
|
|
To analyze particular parallelism levels, try: perf report --latency --parallelism=32-64
|
|
To see how parallelism changes over time, try: perf report -F time,latency,parallelism --time-quantum=1s
|