linux/tools
Jin Yao 3fa7ae3f37 perf report: Fix wrong LBR block sorting
[ Upstream commit f2013278ae ]

When '--total-cycles' is specified, it supports sorting for all blocks
by 'Sampled Cycles%'. This is useful to concentrate on the globally
hottest blocks.

'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles

But in current code, it doesn't use the cycles aggregation. Part of
'cycles' counting is possibly dropped for some overlap jumps. But for
identifying the hot block, we always need the full cycles.

  # perf record -b ./triad_loop
  # perf report --total-cycles --stdio

Before:

  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                          [Program Block Range]      Shared Object
  # ...............  ..............  ...........  ..........  .............................................................  .................
  #
              0.81%             793        4.32%         793                           [setup-vdso.h:34 -> setup-vdso.h:40]         ld-2.27.so
              0.49%             480        0.87%         160                    [native_write_msr+0 -> native_write_msr+16]  [kernel.kallsyms]
              0.48%             476        0.52%          95                      [native_read_msr+0 -> native_read_msr+29]  [kernel.kallsyms]
              0.31%             303        1.65%         303                              [nmi_restore+0 -> nmi_restore+37]  [kernel.kallsyms]
              0.26%             255        1.39%         255      [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]  [kernel.kallsyms]
              0.24%             234        1.28%         234                       [end_repeat_nmi+67 -> end_repeat_nmi+83]  [kernel.kallsyms]
              0.23%             227        1.24%         227            [__irqentry_text_end+96 -> __irqentry_text_end+126]  [kernel.kallsyms]
              0.20%             194        1.06%         194             [native_set_debugreg+52 -> native_set_debugreg+56]  [kernel.kallsyms]
              0.11%             106        0.14%          26                [native_sched_clock+0 -> native_sched_clock+98]  [kernel.kallsyms]
              0.10%              97        0.53%          97            [trigger_load_balance+0 -> trigger_load_balance+67]  [kernel.kallsyms]
              0.09%              85        0.46%          85             [get-dynamic-info.h:102 -> get-dynamic-info.h:111]         ld-2.27.so
  ...
              0.00%           92.7K        0.02%           4                           [triad_loop.c:64 -> triad_loop.c:65]         triad_loop

The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at
the top of output.

After:

  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                           [Program Block Range]      Shared Object
  # ...............  ..............  ...........  ..........  ..............................................................  .................
  #
             94.35%           92.7K        0.02%           4                            [triad_loop.c:64 -> triad_loop.c:65]         triad_loop
              0.81%             793        4.32%         793                            [setup-vdso.h:34 -> setup-vdso.h:40]         ld-2.27.so
              0.49%             480        0.87%         160                     [native_write_msr+0 -> native_write_msr+16]  [kernel.kallsyms]
              0.48%             476        0.52%          95                       [native_read_msr+0 -> native_read_msr+29]  [kernel.kallsyms]
              0.31%             303        1.65%         303                               [nmi_restore+0 -> nmi_restore+37]  [kernel.kallsyms]
              0.26%             255        1.39%         255       [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]  [kernel.kallsyms]
              0.24%             234        1.28%         234                        [end_repeat_nmi+67 -> end_repeat_nmi+83]  [kernel.kallsyms]
              0.23%             227        1.24%         227             [__irqentry_text_end+96 -> __irqentry_text_end+126]  [kernel.kallsyms]
              0.20%             194        1.06%         194              [native_set_debugreg+52 -> native_set_debugreg+56]  [kernel.kallsyms]
              0.11%             106        0.14%          26                 [native_sched_clock+0 -> native_sched_clock+98]  [kernel.kallsyms]
              0.10%              97        0.53%          97             [trigger_load_balance+0 -> trigger_load_balance+67]  [kernel.kallsyms]
              0.09%              85        0.46%          85              [get-dynamic-info.h:102 -> get-dynamic-info.h:111]         ld-2.27.so
              0.08%              82        0.06%          11  [intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627]  [kernel.kallsyms]
              0.08%              77        0.42%          77                  [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133]  [kernel.kallsyms]
              0.08%              74        0.10%          18                [handle_pmi_common+271 -> handle_pmi_common+310]  [kernel.kallsyms]
              0.08%              74        0.40%          74              [get-dynamic-info.h:131 -> get-dynamic-info.h:157]         ld-2.27.so
              0.07%              69        0.09%          17  [intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468]  [kernel.kallsyms]

Now the hottest block is reported at the top of output.

Fixes: b65a7d372b ("perf hist: Support block formats with compare/sort/display")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210407024452.29988-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14 08:42:11 +02:00
..
accounting
arch x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes 2020-12-06 09:58:13 +01:00
bootconfig tools/bootconfig: Add tracing_on support to helper scripts 2021-01-19 18:27:19 +01:00
bpf tools/resolve_btfids: Add /libbpf to .gitignore 2021-04-10 13:36:10 +02:00
build tools: Factor HOSTCC, HOSTLD, HOSTAR definitions 2021-01-30 13:55:19 +01:00
cgroup blk-iocost: update iocost_monitor.py 2020-09-01 19:38:33 -06:00
debugging
edid
firewire
firmware
gpio tools: gpio: fix %llu warning in gpio-watch.c 2021-01-27 11:55:20 +01:00
hv tools: hv: change http to https in hv_kvp_daemon.c 2020-07-06 10:46:23 +00:00
iio iio: add IIO_MOD_O2 modifier 2020-08-22 10:53:12 +01:00
include static_call: Allow module use without exposing static_call_key 2021-03-30 14:31:53 +02:00
io_uring tools/io_uring: fix compile breakage 2020-09-21 07:50:58 -06:00
kvm/kvm_stat tools/kvm_stat: Exempt time-based counters 2020-12-11 19:18:51 -05:00
laptop
leds
lib libbpf: Only create rx and tx XDP rings when necessary 2021-04-14 08:42:01 +02:00
memory-model tools/memory-model: Expand the cheatsheet.txt notion of relaxed 2020-09-04 11:58:15 -07:00
objtool static_call: Allow module use without exposing static_call_key 2021-03-30 14:31:53 +02:00
pci
pcmcia
perf perf report: Fix wrong LBR block sorting 2021-04-14 08:42:11 +02:00
power tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency 2021-02-07 15:37:13 +01:00
scripts tools: Factor HOSTCC, HOSTLD, HOSTAR definitions 2021-01-30 13:55:19 +01:00
spi
testing selftests/vm: fix out-of-tree build 2021-04-10 13:36:09 +02:00
thermal/tmon
time
usb tools: usb: move to tools buildsystem 2020-08-19 14:11:44 +02:00
virtio virtio: fixes, features 2020-08-11 14:34:17 -07:00
vm mm: Add PG_arch_2 page flag 2020-09-04 12:46:06 +01:00
wmi
Makefile bpf: Compile resolve_btfids tool at kernel compilation start 2020-07-13 10:42:02 -07:00