perf docs: arm_spe: Document new discard mode

Document the flag along with PMU events to hint what it's used for and
give an example with other useful options to get minimal output.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
This commit is contained in:
James Clark 2025-01-08 14:28:57 +00:00 committed by Will Deacon
parent d28d95bc63
commit ba113ecad8

View File

@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
store_filter=1 - collect stores only (PMSFCR.ST)
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
than only the execution latency.
@ -220,6 +221,31 @@ Common errors
Increase sampling interval (see above)
PMU events
~~~~~~~~~~
SPE has events that can be counted on core PMUs. These are prefixed with
SAMPLE_, for example SAMPLE_POP, SAMPLE_FEED, SAMPLE_COLLISION and
SAMPLE_FEED_BR.
These events will only count when an SPE event is running on the same core that
the PMU event is opened on, otherwise they read as 0. There are various ways to
ensure that the PMU event and SPE event are scheduled together depending on the
way the event is opened. For example opening both events as per-process events
on the same process, although it's not guaranteed that the PMU event is enabled
first when context switching. For that reason it may be better to open the PMU
event as a systemwide event and then open SPE on the process of interest.
Discard mode
~~~~~~~~~~~~
SPE related (SAMPLE_* etc) core PMU events can be used without the overhead of
collecting sample data if discard mode is supported (optional from Armv8.6).
First run a system wide SPE session (or on the core of interest) using options
to minimize output. Then run perf stat:
perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
perf stat -e SAMPLE_FEED_LD
SEE ALSO
--------