Add the evsel from evsel__parse_sample into the struct
perf_sample. Sometimes we want to alter the evsel associated with a
sample, such as with off-cpu bpf-output events. In general the evsel
and perf_sample are passed as a pair, but this makes an altered evsel
something of a chore to keep checking for and setting up. Later
patches will remove passing an evsel with the perf_sample and switch
to just using the perf_sample's value.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The deferred stack trace code wasn't using perf_sample__init/exit. Add
the deferred stack trace clean up to perf_sample__exit which requires
proper NULL initialization in perf_sample__init. Make the
perf_sample__exit robust to being called more than once by using
zfree. Make the error paths in evsel__parse_sample exit the
sample. Add a merged_callchain boolean to capture that callchain is
allocated, deferred_callchain doen't suffice for this. Pack the struct
variables to avoid padding bytes for this.
Similiarly powerpc_vpadtl_sample wasn't using perf_sample__init/exit,
use it for consistency and potential issues with uninitialized
variables.
Similarly guest_session__inject_events in builtin-inject wasn't using
perf_sample_init/exit. The lifetime management for fetched events is
somewhat complex there, but when an event is fetched the sample should
be initialized and needs exiting on error. The sample may be left in
place so that future injects have access to it.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In situations like the perf data converter the evsel__name will be
used to create babeltrace events. If the events have the same name
then creation can fail. Avoid these failures by including more
information into the unknown event names.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Frame pointer callchains are not supported on s390 and dwarf
callchains are only supported on software events.
Switch the default event from the hardware 'cycles' event to the
software 'cpu-clock' or 'task-clock' on s390 if callchains are
enabled. Move some of the target initialization earlier in builtin-top
and builtin-record, so it is ready for use by evlist__new_default.
If frame pointer callchains are requested on s390 show a
warning. Modify the '-g' option of `perf top` and `perf record` to
default to dwarf callchains on s390.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The options are used to configure the evsel but are not themselves
configured. Make the arguments const to better capture this.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Switch to using evsel__match rather than comparing perf_event_attr
values, this is robust on hybrid architectures.
Ensure evsel->pmu matches the evsel->core.attr.
Remove exclude bits that get set in other fallback attempts when
switching the event.
Log the event name with modifiers when switching the event on fallback.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently, some architecture-specific perf-regs functions, such as
arch__intr_reg_mask() and arch__user_reg_mask(), are defined with the
__weak attribute.
This approach ensures that only functions matching the architecture of
the build/run host are compiled and executed, reducing build time and
binary size.
However, this __weak attribute restricts these functions to be called
only on the same architecture, preventing cross-architecture
functionality.
For example, a perf.data file captured on x86 cannot be parsed on an ARM
platform.
To address this limitation, this patch removes the __weak attribute from
these perf-regs functions.
The architecture-specific code is moved from the arch/ directory to the
util/perf-regs-arch/ directory.
The appropriate architectural functions are then called based on the
EM_HOST.
No functional changes are intended.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xudong Hao <xudong.hao@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
[ Fixed up somme fuzz with s390 and riscv Build files wrt removing perf_regs.o ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow e_flags as well as e_machine to be computed using the e_machine
helper.
This isn't currently used, the argument is always NULL, but it will be
used for a new header feature.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Aditya Bodkhe <aditya.b1@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Anubhav Shelat <ashelat@redhat.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quan Zhou <zhouquan@iscas.ac.cn>
Cc: Shimin Guo <shimin.guo@skydio.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pass the e_machine to the kvm functions so that they aren't just wired
to EM_HOST.
In the case of a session move some setup until the session
is created.
As the session isn't fully running the default EM_HOST is returned as no
e_machine can be found in a running machine.
This is, however, some marginal progress to cross platform support.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Aditya Bodkhe <aditya.b1@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Anubhav Shelat <ashelat@redhat.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quan Zhou <zhouquan@iscas.ac.cn>
Cc: Shimin Guo <shimin.guo@skydio.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The arch string requires multiple strcmp to identify things like the
IP and SP.
Switch to passing in an e_machine that in the bulk of cases is computed
using a current thread load.
The e_machine also allows identification of 32-bit vs 64-bit processes.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Aditya Bodkhe <aditya.b1@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Haibo Xu <haibo1.xu@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Wielaard <mark@klomp.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergei Trofimovich <slyich@gmail.com>
Cc: Shimin Guo <shimin.guo@skydio.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
[ Include dwarf-regs.h to get conditional defines for EM_CSKY and EM_LOONGARCH, not available in old distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This will be used by aux PMUs to read an already written value for
configuring their events and for also testing.
Its helper perf_pmu__format_unpack() does the opposite of the existing
pmu_format_value() so rename that one to perf_pmu__format_pack() so it's
clear how they are related.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Misleadingly, evsel__set_config_if_unset() only works with the config
field and not config1, config2, etc. This is fine at the moment because
all users of it happen to operate on bits that are in that config field.
Fix it before there are any new users of the function which operate on
bits in different config fields.
In theory it's also possible for a driver to move an existing bit to
another config field and this fixes that scenario too, although this
hasn't happened yet either.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we only track which bits were set by the user in attr->config.
But all configN fields should be treated equally as they can all have
default and user overridden values.
Track them all by making get_config_chgs() generic and calling it once
for each config value.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sparse config fields are technically supported although currently
unused. field_prep() only works for contiguous bitfields so replace it
with pmu_format_value().
pmu_format_value() also takes a bitmap rather than a u64 so replace
'u64 bits' with format->bits.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
At least one of these were put here to avoid a Python binding linking
issue which is no longer present. Put them back in their correct
location to avoid confusion about which file to add a new evsel__*
function to later.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/all/ZEbAS2yx2fguW60w@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
strerror() has thread safety issues, strerror_r() requires stack
allocated buffers.
Code in perf has already been using the "%m" formatting flag that is a
widely support glibc extension to print the current errno's description.
Expand the usage of this formatting flag and remove usage of
strerror()/strerror_r().
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Haibo Xu <haibo1.xu@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Yunseong Kim <ysk@kzalloc.com>
Cc: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new callchain record mode option for deferred callchains. For now
it only works with FP (frame-pointer) mode.
And add the missing feature detection logic to clear the flag on old
kernels.
$ perf record --call-graph fp,defer -vv true
...
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CALLCHAIN|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
defer_callchain 1
defer_output 1
------------------------------------------------------------
sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off deferred callchain support
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a new event type for deferred callchains and a new callback for the
struct perf_tool. For now it doesn't actually handle the deferred
callchains but it just marks the sample if it has the PERF_CONTEXT_
USER_DEFFERED in the callchain array.
At least, perf report can dump the raw data with this change. Actually
this requires the next commit to enable attr.defer_callchain, but if you
already have a data file, it'll show the following result.
$ perf report -D
...
0x2158@perf.data [0x40]: event: 22
.
. ... raw event: size 64 bytes
. 0000: 16 00 00 00 02 00 40 00 06 00 00 00 0b 00 00 00 ......@.........
. 0010: 03 00 00 00 00 00 00 00 a7 7f 33 fe 18 7f 00 00 ..........3.....
. 0020: 0f 0e 33 fe 18 7f 00 00 48 14 33 fe 18 7f 00 00 ..3.....H.3.....
. 0030: 08 09 00 00 08 09 00 00 e6 7a e7 35 1c 00 00 00 .........z.5....
121163447014 0x2158 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2312/2312: 0xb00000006
... FP chain: nr:3
..... 0: 00007f18fe337fa7
..... 1: 00007f18fe330e0f
..... 2: 00007f18fe331448
: unhandled!
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ensure the metric_leader is copied and set up correctly. In
compute_metric determine the correct metric_leader event to match the
requested CPU. Fixes the handling of metrics particularly on hybrid
machines.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ
on recent kernels. Not having _TID makes the feature check wrongly detect
the inherit and _READ support.
It was reported that the following command failed due to the error in
the missing feature check on Intel SPR machines.
$ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls
Error:
Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed.
Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'.
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 3b193a57ba ("perf tools: Detect missing kernel features properly")
Reported-and-tested-by: Chen, Zide <zide.chen@intel.com>
Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The metric_events exist in the metric_expr list and so this variable
has been unused for a while.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ensure both the perf_event_attr and alternate_hw_config are checked in
the match. Don't mask the config if the perf_event_attr isn't a
HARDWARE or HW_CACHE event. Add common early exit cases.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Provide ratio-to-prev term which allows the user to
set the event sample period of two events corresponding
to a desired ratio.
If using on an Intel x86 platform with Auto Counter Reload support, also
set corresponding event's config2 attribute with a bitmask which
counters to reset and which counters to sample if the desired ratio is
met or exceeded.
On other platforms, only the sample period is affected by the
ratio-to-prev term.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For the sake of Intel topdown events commit 9eac5612da ("perf
stat: Don't skip failing group events") changed 'perf stat' error
handling making it so that more errors were fatal and didn't report
"<not supported>" events. The change outside of topdown events was
unintentional.
The notion of "fatal" error handling was introduced in commit
e0e6a6ca3a ("perf stat: Factor out open error handling") and
refined in commits like commit cb5ef60067 ("perf stat: Error out
unsupported group leader immediately") to be an approach for avoiding
later assertion failures in the code base.
This change fixes those issues and removes the notion of a fatal error
on an event. If all events fail to open then a fatal error occurs with
the previous fatal error message. This seems to best match the notion of
supported events and allowing some errors not to stop 'perf stat', while
allowing the truly fatal no event case to terminate the tool early.
The evsel->errored flag is only used in the stat code but always just
meaning !evsel->supported although there is a comment about it being
sticky. Force all evsels to be supported in evsel__init and then clear
this when evsel__open fails. When an event is tried the supported is
set to true again. This simplifies the notion of whether an evsel is
broken.
In the get_group_fd code, fail to get a group fd when the evsel isn't
supported. If the leader isn't supported then it is also expected that
there is no group_fd as the leader will have been skipped. Therefore
change the BUG_ON test to be on supported rather than skippable. This
corrects the assertion errors that were the reason for the previous
fatal error handling.
Fixes: 9eac5612da ("perf stat: Don't skip failing group events")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20251002220727.1889799-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fallback message is unconditionally printed in places like
record__open().
If no fallback is attempted this can lead to printing uninitialized
data, crashes, etc.
Fixes: c0a54341c0 ("perf evsel: Introduce event fallback method")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The PMU name is appearing twice in:
```
$ perf stat -e uncore_imc_free_running/data_total/ -A true
Performance counter stats for 'system wide':
CPU0 1.57 MiB uncore_imc_free_running_0/uncore_imc_free_running,data_total/
CPU0 1.58 MiB uncore_imc_free_running_1/uncore_imc_free_running,data_total/
0.000892376 seconds time elapsed
```
Use the pmu_name_len_no_suffix to avoid this problem.
Committer testing:
After this patch:
root@x1:~# perf stat -e uncore_imc_free_running/data_total/ -A true
Performance counter stats for 'system wide':
CPU0 1.69 MiB uncore_imc_free_running_0/data_total/
CPU0 1.68 MiB uncore_imc_free_running_1/data_total/
0.002141605 seconds time elapsed
root@x1:~#
Fixes: 7d45f402d3 ("perf evlist: Make uniquifying counter names consistent")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extend arch_evsel__open_strerror() from just AMD IBS events to Intel
core PMU events, to give a message when a slots event isn't a group
leader or when a perf metric event is duplicated within an event
group.
As generating the warning happens after non-arch specific warnings are
generated, disable the missing system wide (-a) flag warning for the
core PMU.
This assumes core PMU events should support per-thread/process and
system-wide.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An evsel should typically have a leader of itself, however, in tests
like 'Sample parsing' a NULL leader may occur and the container_of
will return a corrupt pointer.
Avoid this with an explicit NULL test.
Fixes: fba7c86601 ("libperf: Move 'leader' from tools/perf to perf_evsel::leader")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Collin Funk <collin.funk1@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Polensky <japo@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250821163820.1132977-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Start the creation of a tracepoint PMU abstraction. Tracepoint events
don't follow the regular sysfs perf conventions. Eventually the new
PMU abstraction will bridge the gap so tracepoint events look more
like regular perf ones.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250725185202.68671-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
By definition arch sample parsing and synthesis will inhibit certain
kinds of cross-platform record then analysis (report, script,
etc.). Remove arch_perf_parse_sample_weight and
arch_perf_synthesize_sample_weight replacing with a common
implementation. Combine perf_sample p_stage_cyc and retire_lat as
weight3 to capture the differing uses regardless of compiled for
architecture.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The global perf_env was used for the host, but if a perf_env wasn't
easy to come by it was used in a lot of places where potentially
recorded and host data could be confused. Remove the global variable
as now the majority of accesses retrieve the perf_env for the host
from the session.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The session holds a perf_env pointer env. In UI code container_of is
used to turn the env to a session, but this assumes the session
header's env is in use. Rather than a dubious container_of, hold the
session in the evlist and derive the env from the session with
evsel__env, perf_session__env, etc.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add evsel__open_per_cpu_and_thread that combines the operation of
evsel__open_per_cpu and evsel__open_per_thread so that an event
without the "any" cpumask can be opened with its cpumask and with
threads it specifies. Change the implementation of evsel__open_per_cpu
and evsel__open_per_thread to use evsel__open_per_cpu_and_thread to
make the implementation of those functions clearer.
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250719030517.1990983-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Avoid the duplicated code and better enable perf_evsel to change.
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250719030517.1990983-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
own_cpus is generally the cpumask from the PMU. Rename to pmu_cpus to
try to make this clearer. Variable rename with no other changes.
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250719030517.1990983-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Factor requested CPU warning out of evlist and into evsel. At the end
of adding an event, perform the warning check. To avoid repeatedly
testing if the cpu_list is empty, add a local variable.
```
$ perf stat -e cpu_atom/cycles,cpu=1/ -a true
WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cpu_atom/cycles/'
Performance counter stats for 'system wide':
<not supported> cpu_atom/cycles/
0.000781511 seconds time elapsed
```
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250719030517.1990983-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It uses evsel->priv to save per-cpu timing information. It should be
freed when the evsel is released.
Add the priv destructor for evsel same as thread to handle that.
Fixes: 49394a2a24 ("perf sched timehist: Introduce timehist command")
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250703014942.1369397-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
DRM clients expose information through usage stats as documented in
Documentation/gpu/drm-usage-stats.rst (available online at
https://docs.kernel.org/gpu/drm-usage-stats.html). Add a tool like
PMU, similar to the hwmon PMU, that exposes DRM information. For
example on a tigerlake laptop:
```
$ perf list drm
List of pre-defined events (to be used in -e or -M):
drm:
drm-active-stolen-system0
[Total memory active in one or more engines. Unit: drm_i915]
drm-active-system0
[Total memory active in one or more engines. Unit: drm_i915]
drm-engine-capacity-video
[Engine capacity. Unit: drm_i915]
drm-engine-copy
[Utilization in ns. Unit: drm_i915]
drm-engine-render
[Utilization in ns. Unit: drm_i915]
drm-engine-video
[Utilization in ns. Unit: drm_i915]
drm-engine-video-enhance
[Utilization in ns. Unit: drm_i915]
drm-purgeable-stolen-system0
[Size of resident and purgeable memory bufers. Unit: drm_i915]
drm-purgeable-system0
[Size of resident and purgeable memory bufers. Unit: drm_i915]
drm-resident-stolen-system0
[Size of resident memory bufers. Unit: drm_i915]
drm-resident-system0
[Size of resident memory bufers. Unit: drm_i915]
drm-shared-stolen-system0
[Size of shared memory bufers. Unit: drm_i915]
drm-shared-system0
[Size of shared memory bufers. Unit: drm_i915]
drm-total-stolen-system0
[Size of shared and private memory. Unit: drm_i915]
drm-total-system0
[Size of shared and private memory. Unit: drm_i915]
```
System wide data can be gathered:
```
$ perf stat -x, -I 1000 -e drm-active-stolen-system0,drm-active-system0,drm-engine-capacity-video,drm-engine-copy,drm-engine-render,drm-engine-video,drm-engine-video-enhance,drm-purgeable-stolen-system0,drm-purgeable-system0,drm-resident-stolen-system0,drm-resident-system0,drm-shared-stolen-system0,drm-shared-system0,drm-total-stolen-system0,drm-total-system0
1.000904910,0,bytes,drm-active-stolen-system0,1,100.00,,
1.000904910,0,bytes,drm-active-system0,1,100.00,,
1.000904910,36,capacity,drm-engine-capacity-video,1,100.00,,
1.000904910,0,ns,drm-engine-copy,1,100.00,,
1.000904910,1472970566175,ns,drm-engine-render,1,100.00,,
1.000904910,0,ns,drm-engine-video,1,100.00,,
1.000904910,0,ns,drm-engine-video-enhance,1,100.00,,
1.000904910,0,bytes,drm-purgeable-stolen-system0,1,100.00,,
1.000904910,38199296,bytes,drm-purgeable-system0,1,100.00,,
1.000904910,0,bytes,drm-resident-stolen-system0,1,100.00,,
1.000904910,4643196928,bytes,drm-resident-system0,1,100.00,,
1.000904910,0,bytes,drm-shared-stolen-system0,1,100.00,,
1.000904910,1886871552,bytes,drm-shared-system0,1,100.00,,
1.000904910,0,bytes,drm-total-stolen-system0,1,100.00,,
1.000904910,4643196928,bytes,drm-total-system0,1,100.00,,
2.264426839,0,bytes,drm-active-stolen-system0,1,100.00,,
```
Or for a particular process:
```
$ perf stat -x, -I 1000 -e drm-active-stolen-system0,drm-active-system0,drm-engine-capacity-video,drm-engine-copy,drm-engine-render,drm-engine-video,drm-engine-video-enhance,drm-purgeable-stolen-system0,drm-purgeable-system0,drm-resident-stolen-system0,drm-resident-system0,drm-shared-stolen-system0,drm-shared-system0,drm-total-stolen-system0,drm-total-system0 -p 200027
1.001040274,0,bytes,drm-active-stolen-system0,6,100.00,,
1.001040274,0,bytes,drm-active-system0,6,100.00,,
1.001040274,12,capacity,drm-engine-capacity-video,6,100.00,,
1.001040274,0,ns,drm-engine-copy,6,100.00,,
1.001040274,1542300,ns,drm-engine-render,6,100.00,,
1.001040274,0,ns,drm-engine-video,6,100.00,,
1.001040274,0,ns,drm-engine-video-enhance,6,100.00,,
1.001040274,0,bytes,drm-purgeable-stolen-system0,6,100.00,,
1.001040274,13516800,bytes,drm-purgeable-system0,6,100.00,,
1.001040274,0,bytes,drm-resident-stolen-system0,6,100.00,,
1.001040274,27746304,bytes,drm-resident-system0,6,100.00,,
1.001040274,0,bytes,drm-shared-stolen-system0,6,100.00,,
1.001040274,0,bytes,drm-shared-system0,6,100.00,,
1.001040274,0,bytes,drm-total-stolen-system0,6,100.00,,
1.001040274,27746304,bytes,drm-total-system0,6,100.00,,
2.016629075,0,bytes,drm-active-stolen-system0,6,100.00,,
```
As with the hwmon PMU, high numbered PMU types are used to encode
multiple possible "DRM" PMUs. The appropriate fdinfo is found by
scanning /proc and filtering which fdinfos to read with stat. To avoid
some unneeding scanning, events not starting with "drm-" are
ignored. The patch builds on commit 57e13264dc ("perf pmus:
Restructure pmu_read_sysfs to scan fewer PMUs") and later so that only
if full wild carding is being done, the PMU starts with "drm_" or the
event starts with "drm-" will /proc be scanned. That is there should
be little to no cost in this PMU unless DRM events are requested.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250624231837.179536-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The merge stat code fails for uncore events if they are repeated twice,
for example `perf stat -e clockticks,clockticks -I 1000` as the counts
of the second set of uncore events will be merged into the first
counter.
Reimplement the logic to have a first_wildcard_match so that merged
later events correctly merge into the first wildcard event that they
will be aggregated into.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Chun-Tse Shao <ctshao@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250513215401.2315949-3-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf stat' has different uniquification logic to 'perf record' and perf
top. In the case of perf record and 'perf top' all hybrid event names
are uniquified.
'perf stat' is more disciplined respecting name config terms, libpfm4
events, etc.
'perf stat' will uniquify hybrid events and the non-core PMU cases
shouldn't apply to perf record or 'perf top'.
For consistency, remove the uniquification for 'perf record' and 'perf
top' and reuse the 'perf stat' uniquification, making the code more
globally visible for this.
Fix the detection of cross-PMU for disabling uniquify by correctly
setting last_pmu.
When setting uniquify on an evsel, make sure the PMUs between the 2
considered events differ otherwise the uniquify isn't adding value.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Chun-Tse Shao <ctshao@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250513215401.2315949-2-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The mrvl_ddr_pmu will return EOPNOTSUPP if opened in per-thread
mode. Give a warning for this similar to EINVAL.
Doing this better supports metric testing with limited permissions when
the mrvl_ddr_pmu is present, as the failure to open causes the test to
skip and not fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250412004704.2297939-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Parse the off-cpu event using parse_event(), as bpf-output.
Call evlist__enable_evsel() on off-cpu event. This fixes the inability
to collect direct off-cpu samples on a workload, as reported by Arnaldo
Carvalho de Melo <acme@redhat.com>.
The reason being, workload sets enable_on_exec instead of calling
evlist__enable(), but off-cpu event does not attach to an executable and
execve won't be called, so the fds from perf_event_open() are not
enabled.
no-inherit should be set to 1, here's the reason:
We update the BPF perf_event map for direct off-cpu sample dumping (in
following patches), it executes as follows:
bpf_map_update_value()
bpf_fd_array_map_update_elem()
perf_event_fd_array_get_ptr()
perf_event_read_local()
In perf_event_read_local(), there is:
int perf_event_read_local(struct perf_event *event, u64 *value,
u64 *enabled, u64 *running)
{
...
/*
* It must not be an event with inherit set, we cannot read
* all child counters from atomic context.
*/
if (event->attr.inherit) {
ret = -EOPNOTSUPP;
goto out;
}
Which means no-inherit has to be true for updating the BPF perf_event
map.
Moreover, for bpf-output events, we primarily want a system-wide event
instead of a per-task event.
The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
index to retrieve the perf_event file descriptor it outputs to.
Making a bpf-output event system-wide naturally satisfies this
requirement by mapping CPU appropriately.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(),
evsel__parse_sample() and 'perf script'.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf record' will fail with retirement latency events as the open
doesn't do a perf_event_open system call.
Use evsel__config() to set up such events for recording by removing the
flag and enabling sample weights - the sample weights containing the
retirement latency.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250414174134.3095492-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Factor sending record control fd code into its own function.
Rather than killing the record process send it a ping when reading.
Timeouts were witnessed if done too frequently, so only ping for the
first tpebs events.
Don't kill the record command send it a stop command.
As close isn't reliably called also close on evsel__exit.
Add extra checks on the record being terminated to avoid warnings.
Adjust the locking as needed and incorporate extra -Wthread-safety
checks.
Check to do six 500ms poll timeouts when sending commands, rather than
the larger 3000ms, to allow the record process terminating to be better
witnessed.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250414174134.3095492-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename to reflect evsel argument and for consistency with other tpebs
functions.
Update count from prev_raw_counts when available.
Eventually this will allow inteval mode support.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250414174134.3095492-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>