Commit Graph

418 Commits

Author SHA1 Message Date
Linus Torvalds
0939bd2fcf perf tools improvements and fixes for Linux v6.16:
perf report/top/annotate TUI:
 
 - Accept the left arrow key as a Zoom out if done on the first column.
 
 - Show if source code toggle status in title, to help spotting bugs with
   the various disassemblers (capstone, llvm, objdump).
 
 - Provide feedback on unhandled hotkeys.
 
 Build:
 
 - Better inform when certain features are not available with warnings in the
   build process and in 'perf version --build-options' or 'perf -vv'.
 
 perf record:
 
 - Improve the --off-cpu code by synthesizing events for switch-out -> switch-in
   intervals using a BPF program. This can be fine tuned using a --off-cpu-thresh
   knob.
 
 perf report:
 
 - Add 'tgid' sort key.
 
 perf mem/c2c:
 
 - Add 'op', 'cache', 'snoop', 'dtlb' output fields.
 
 - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling).
 
 perf ftrace:
 
 - Use process/session specific trace settings instead of messing with
   the global ftrace knobs.
 
 perf trace:
 
 - Implement syscall summary in BPF.
 
 - Support --summary-mode=cgroup.
 
 - Always print return value for syscalls returning a pid.
 
 - The rseq and set_robust_list don't return a pid, just -errno.
 
 perf lock contention:
 
 -  Symbolize zone->lock using BTF.
 
 - Add -J/--inject-delay option to estimate impact on application performance by
   optimization of kernel locking behavior.
 
 perf stat:
 
 - Improve hybrid support for the NMI watchdog warning.
 
 Symbol resolution:
 
 - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust symbols.
 
 - Improve Rust demangler.
 
 Hardware tracing:
 
 Intel PT:
 
 - Fix PEBS-via-PT data_src.
 
 - Do not default to recording all switch events.
 
 - Fix pattern matching with python3 on the SQL viewer script.
 
 arm64:
 
 - Fixups for the hip08 hha PMU.
 
 Vendor events:
 
 - Update Intel events/metrics files for alderlake, alderlaken, arrowlake,
   bonnell, broadwell, broadwellde, broadwellx, cascadelakex, clearwaterforest,
   elkhartlake, emeraldrapids, grandridge, graniterapids, haswell, haswellx,
   icelake, icelakex, ivybridge, ivytown, jaketown, lunarlake, meteorlake,
   nehalemep, nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest,
   skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp,
   westmereep-sx.
 
 python support:
 
 - Add support for event counts in the python binding, add a counting.py example.
 
 perf list:
 
 - Display the PMU name associated with a perf metric in JSON.
 
 perf test:
 
 - Hybrid improvements for metric value validation test.
 
 - Fix LBR test by ignoring idle task.
 
 - Add AMD IBS sw filter ana d'ldlat' tests.
 
 - Add 'perf trace --summary-mode=cgroup' test.
 
 - Add tests for the various language symbol demanglers.
 
 Miscellaneous.
 
 - Allow specifying the cpu an event will be tied using '-e event/cpu=N/'.
 
 - Sync various headers with the kernel sources.
 
 - Add annotations to use clang's -Wthread-safety and fix some problems
   it detected.
 
 - Make dump_stack() use perf's symbol resolution to provide better backtraces.
 
 - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS
   (Precision Event-Based Sampling), that adds timing info, the retirement
   latency of instructions.
 
 - Various memory allocation (some detected by ASAN) and reference counting
   fixes.
 
 - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace PERF_RECORD_COMPRESSED.
 
 - Skip unsupported event types in perf.data files, don't stop when finding one.
 
 - Improve lookups using hashmaps and binary searches.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCaD9ViwAKCRCyPKLppCJ+
 JzOfAQDXlukhPQyuJ4j1ie0x1QO4jalloMbG1Bkp3hn6yjxafAD9Ha5wr+dwnAj4
 FfxOVqua29r8Htn4aGahXZ0nnlVp9Ac=
 =bwgD
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "perf report/top/annotate TUI:

   - Accept the left arrow key as a Zoom out if done on the first column

   - Show if source code toggle status in title, to help spotting bugs
     with the various disassemblers (capstone, llvm, objdump)

   - Provide feedback on unhandled hotkeys

  Build:

   - Better inform when certain features are not available with warnings
     in the build process and in 'perf version --build-options' or 'perf -vv'

  perf record:

   - Improve the --off-cpu code by synthesizing events for switch-out ->
     switch-in intervals using a BPF program. This can be fine tuned
     using a --off-cpu-thresh knob

  perf report:

   - Add 'tgid' sort key

  perf mem/c2c:

   - Add 'op', 'cache', 'snoop', 'dtlb' output fields

   - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling)

  perf ftrace:

   - Use process/session specific trace settings instead of messing with
     the global ftrace knobs

  perf trace:

   - Implement syscall summary in BPF

   - Support --summary-mode=cgroup

   - Always print return value for syscalls returning a pid

   - The rseq and set_robust_list don't return a pid, just -errno

  perf lock contention:

   - Symbolize zone->lock using BTF

   - Add -J/--inject-delay option to estimate impact on application
     performance by optimization of kernel locking behavior

  perf stat:

   - Improve hybrid support for the NMI watchdog warning

  Symbol resolution:

   - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust
     symbols

   - Improve Rust demangler

  Hardware tracing:

  Intel PT:

   - Fix PEBS-via-PT data_src

   - Do not default to recording all switch events

   - Fix pattern matching with python3 on the SQL viewer script

  arm64:

   - Fixups for the hip08 hha PMU

  Vendor events:

   - Update Intel events/metrics files for alderlake, alderlaken,
     arrowlake, bonnell, broadwell, broadwellde, broadwellx,
     cascadelakex, clearwaterforest, elkhartlake, emeraldrapids,
     grandridge, graniterapids, haswell, haswellx, icelake, icelakex,
     ivybridge, ivytown, jaketown, lunarlake, meteorlake, nehalemep,
     nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest,
     skylake, skylakex, snowridgex, tigerlake, westmereep-dp,
     westmereep-sp, westmereep-sx

  python support:

   - Add support for event counts in the python binding, add a
     counting.py example

  perf list:

   - Display the PMU name associated with a perf metric in JSON

  perf test:

   - Hybrid improvements for metric value validation test

   - Fix LBR test by ignoring idle task

   - Add AMD IBS sw filter ana d'ldlat' tests

   - Add 'perf trace --summary-mode=cgroup' test

   - Add tests for the various language symbol demanglers

  Miscellaneous:

   - Allow specifying the cpu an event will be tied using '-e
     event/cpu=N/'

   - Sync various headers with the kernel sources

   - Add annotations to use clang's -Wthread-safety and fix some
     problems it detected

   - Make dump_stack() use perf's symbol resolution to provide better
     backtraces

   - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS
     (Precision Event-Based Sampling), that adds timing info, the
     retirement latency of instructions

   - Various memory allocation (some detected by ASAN) and reference
     counting fixes

   - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace
     PERF_RECORD_COMPRESSED

   - Skip unsupported event types in perf.data files, don't stop when
     finding one

   - Improve lookups using hashmaps and binary searches"

* tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (206 commits)
  perf callchain: Always populate the addr_location map when adding IP
  perf lock contention: Reject more than 10ms delays for safety
  perf trace: Set errpid to false for rseq and set_robust_list
  perf symbol: Move demangling code out of symbol-elf.c
  perf trace: Always print return value for syscalls returning a pid
  perf script: Print PERF_AUX_FLAG_COLLISION flag
  perf mem: Show absolute percent in mem_stat output
  perf mem: Display sort order only if it's available
  perf mem: Describe overhead calculation in brief
  perf record: Fix incorrect --user-regs comments
  Revert "perf thread: Ensure comm_lock held for comm_list"
  perf test trace_summary: Skip --bpf-summary tests if no libbpf
  perf test intel-pt: Skip jitdump test if no libelf
  perf intel-tpebs: Avoid race when evlist is being deleted
  perf test demangle-java: Don't segv if demangling fails
  perf symbol: Fix use-after-free in filename__read_build_id
  perf pmu: Avoid segv for missing name/alias_name in wildcarding
  perf machine: Factor creating a "live" machine out of dwarf-unwind
  perf test: Add AMD IBS sw filter test
  perf mem: Count L2 HITM for c2c statistic
  ...
2025-06-03 15:11:44 -07:00
Linus Torvalds
7f9039c524 Generic:
* Clean up locking of all vCPUs for a VM by using the *_nest_lock()
   family of functions, and move duplicated code to virt/kvm/.
   kernel/ patches acked by Peter Zijlstra.
 
 * Add MGLRU support to the access tracking perf test.
 
 ARM fixes:
 
 * Make the irqbypass hooks resilient to changes in the GSI<->MSI
   routing, avoiding behind stale vLPI mappings being left behind. The
   fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
   and nuking the vLPI mapping upon a routing change.
 
 * Close another VGIC race where vCPU creation races with VGIC
   creation, leading to in-flight vCPUs entering the kernel w/o private
   IRQs allocated.
 
 * Fix a build issue triggered by the recently added workaround for
   Ampere's AC04_CPU_23 erratum.
 
 * Correctly sign-extend the VA when emulating a TLBI instruction
   potentially targeting a VNCR mapping.
 
 * Avoid dereferencing a NULL pointer in the VGIC debug code, which can
   happen if the device doesn't have any mapping yet.
 
 s390:
 
 * Fix interaction between some filesystems and Secure Execution
 
 * Some cleanups and refactorings, preparing for an upcoming big series
 
 x86:
 
 * Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to
   fix a race between AP destroy and VMRUN.
 
 * Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM.
 
 * Refine and harden handling of spurious faults.
 
 * Add support for ALLOWED_SEV_FEATURES.
 
 * Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y.
 
 * Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features
   that utilize those bits.
 
 * Don't account temporary allocations in sev_send_update_data().
 
 * Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
 
 * Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
   SVM and VMX.
 
 * Advertise support to userspace for WRMSRNS and PREFETCHI.
 
 * Rescan I/O APIC routes after handling EOI that needed to be intercepted due
   to the old/previous routing, but not the new/current routing.
 
 * Add a module param to control and enumerate support for device posted
   interrupts.
 
 * Fix a potential overflow with nested virt on Intel systems running 32-bit kernels.
 
 * Flush shadow VMCSes on emergency reboot.
 
 * Add support for SNP to the various SEV selftests.
 
 * Add a selftest to verify fastops instructions via forced emulation.
 
 * Refine and optimize KVM's software processing of the posted interrupt bitmap, and share
   the harvesting code between KVM and the kernel's Posted MSI handler
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg9TjwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOUxQf7B7nnWqIKd7jSkGzSD6YsSX9TXktr
 2tJIOfWM3zNYg5GRCidg+m4Y5+DqQWd3Hi5hH2P9wUw7RNuOjOFsDe+y0VBr8ysE
 ve39t/yp+mYalNmHVFl8s3dBDgrIeGKiz+Wgw3zCQIBZ18rJE1dREhv37RlYZ3a2
 wSvuObe8sVpCTyKIowDs1xUi7qJUBoopMSuqfleSHawRrcgCpV99U8/KNFF5plLH
 7fXOBAHHniVCVc+mqQN2wxtVJDhST+U3TaU4GwlKy9Yevr+iibdOXffveeIgNEU4
 D6q1F2zKp6UdV3+p8hxyaTTbiCVDqsp9WOgY/0I/f+CddYn0WVZgOlR+ow==
 =mYFL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
  Generic:

   - Clean up locking of all vCPUs for a VM by using the *_nest_lock()
     family of functions, and move duplicated code to virt/kvm/. kernel/
     patches acked by Peter Zijlstra

   - Add MGLRU support to the access tracking perf test

  ARM fixes:

   - Make the irqbypass hooks resilient to changes in the GSI<->MSI
     routing, avoiding behind stale vLPI mappings being left behind. The
     fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
     and nuking the vLPI mapping upon a routing change

   - Close another VGIC race where vCPU creation races with VGIC
     creation, leading to in-flight vCPUs entering the kernel w/o
     private IRQs allocated

   - Fix a build issue triggered by the recently added workaround for
     Ampere's AC04_CPU_23 erratum

   - Correctly sign-extend the VA when emulating a TLBI instruction
     potentially targeting a VNCR mapping

   - Avoid dereferencing a NULL pointer in the VGIC debug code, which
     can happen if the device doesn't have any mapping yet

  s390:

   - Fix interaction between some filesystems and Secure Execution

   - Some cleanups and refactorings, preparing for an upcoming big
     series

  x86:

   - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
     to fix a race between AP destroy and VMRUN

   - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
     the VM

   - Refine and harden handling of spurious faults

   - Add support for ALLOWED_SEV_FEATURES

   - Add #VMGEXIT to the set of handlers special cased for
     CONFIG_RETPOLINE=y

   - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
     features that utilize those bits

   - Don't account temporary allocations in sev_send_update_data()

   - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
     Threshold

   - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
     IBPB, between SVM and VMX

   - Advertise support to userspace for WRMSRNS and PREFETCHI

   - Rescan I/O APIC routes after handling EOI that needed to be
     intercepted due to the old/previous routing, but not the
     new/current routing

   - Add a module param to control and enumerate support for device
     posted interrupts

   - Fix a potential overflow with nested virt on Intel systems running
     32-bit kernels

   - Flush shadow VMCSes on emergency reboot

   - Add support for SNP to the various SEV selftests

   - Add a selftest to verify fastops instructions via forced emulation

   - Refine and optimize KVM's software processing of the posted
     interrupt bitmap, and share the harvesting code between KVM and the
     kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  rtmutex_api: provide correct extern functions
  KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
  KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
  KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
  KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
  KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
  KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
  KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
  arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
  KVM: s390: Simplify and move pv code
  KVM: s390: Refactor and split some gmap helpers
  KVM: s390: Remove unneeded srcu lock
  s390: Remove unneeded includes
  s390/uv: Improve splitting of large folios that cannot be split while dirty
  s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
  s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
  rust: add helper for mutex_trylock
  RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
  KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
  x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
  ...
2025-06-02 12:24:58 -07:00
Paolo Bonzini
3e0797f6dd KVM selftests changes for 6.16:
- Add support for SNP to the various SEV selftests.
 
  - Add a selftest to verify fastops instructions via forced emulation.
 
  - Add MGLRU support to the access tracking perf test.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmd0ACgkQOlYIJqCj
 N/3WCBAAsQ5zS8e+B1+7xopSz41eCou8L7KBDZZSe4B9TAuT+hMslXBEculyOJqh
 tIlBFlvrQA/hC2tNYla58jIeA6/f08Jq4/sV1URMNvORFKMcIvgnKpxmMJfKujve
 L2iHvJigJs4hoBCXYHCZHTkd5VAtB6j++7y9rqZS+RznM6z6/NI9SalX7pHr7Sri
 DQeaMc71UYJfllvLyLmI+MbQccdLfQ1v4dmkt6pz29K5s0pX9PQYp54+Hu1Z73Te
 aFdrG+CuDchra1jxLFoell5P9bD6nq9SsNBfdf+6VjYk/1MMHP4yX/dAFEtEqMbm
 RJNX95bewY4mms3fj6e9j8jVXDLBiXR2an8yJI8k6CP6VPsIXQn+RG2pUQMcOUj0
 zcWikbfXvfn+ReIoaeReWPyZ7tPMW33mhnHDPy/saWHdZ9sycI4w2DstKgc2pe9E
 e6jI9H5JiH49CoMnue38kwnACNUIIvolJDpWeU6K0vQz4p5k6eUNTMSTEEVZbwiV
 Y8MVqMIf+Cu+y6UY1co5OhH387kFuLgYMC/LIFz/4nOrlopRCAzMvYcFEqo9gIOO
 0+Ls/lkPc/hU5D2f3/20UjAGKVY/GfTwKJDRFptzaYMfmiMWW0pl2zlHagYp1huM
 7k8p0vVh5rFOLUJxiftXC8+jBVyJKXLGgwPxBdLVapFMs9DU9gI=
 =v9bp
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.16:

 - Add support for SNP to the various SEV selftests.

 - Add a selftest to verify fastops instructions via forced emulation.

 - Add MGLRU support to the access tracking perf test.
2025-05-27 12:15:26 -04:00
Paolo Bonzini
ebd38b26ec KVM x86 misc changes for 6.16:
- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
    SVM and VMX.
 
  - Advertise support to userspace for WRMSRNS and PREFETCHI.
 
  - Rescan I/O APIC routes after handling EOI that needed to be intercepted due
    to the old/previous routing, but not the new/current routing.
 
  - Add a module param to control and enumerate support for device posted
    interrupts.
 
  - Misc cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmHsACgkQOlYIJqCj
 N/2fYhAAiwKkqQpOWLcGjjezDTnpMqDDbHCffroq0Ttmqfg/cuul1oyZax+9fxBO
 203HUi5VKcG7uAGSpLcMFkPUs9hKnaln2lsDaQD+AnGucdj+JKF5p3INCsSYCo9N
 LVRjRZWtZocxJwHSHX9gU8om0pJ5fBCBG2+7+7XhWRaIqCpJe5k944JotiiOkgZ4
 5sXeITkN2kouFVMI8eD4wQGNXxRxs837SYUlwCnoD3VuuBesOZuEhz/CEL9l8vNY
 keXBLPg7bSW53clKfquNKwXDQRephnZaYoexDebUd+OlZphGhTIPh+C75xPQLWSi
 aYg6W9XDu3TChf4LPxHnJLwLg/rjeKNQARcxrnb3XLpPAtx3i2cKU8pDPhnd4qn0
 +YV5H0dato8bbe+oClGv+oIolM01qfI9SJVoaEhTPu3Rdw9cCQSVFn5t32vG3Vab
 FVxX+seV3+XTmVveD4cjiiMbqtNADwZ/PmHNAi9QCl46DgHR++MLfRtjYuGo1koL
 QOmCg2fWOFYtQT6XPJqZxp1SHYxuawrB4qcO9FNyxTuMChoslYoLAr3mBUj0DvwL
 fdXNof74Ccj8OK9o3uCPOXS92pZz92rz/edy/XmYiCmO+VEwBJFR6IdginZMPX11
 UAc4mAC+KkDTvOcKPPIWEXArOsfQMFKefi+bPOeWUx9/nqpcVws=
 =Vs9P
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.16:

 - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
   SVM and VMX.

 - Advertise support to userspace for WRMSRNS and PREFETCHI.

 - Rescan I/O APIC routes after handling EOI that needed to be intercepted due
   to the old/previous routing, but not the new/current routing.

 - Add a module param to control and enumerate support for device posted
   interrupts.

 - Misc cleanups.
2025-05-27 12:14:36 -04:00
Arnaldo Carvalho de Melo
444f03645f tools headers x86 cpufeatures: Sync with the kernel sources to pick ZEN6 and Indirect Target Selection (ITS) bits
To pick the changes from:

  24ee8d9432 ("x86/CPU/AMD: Add X86_FEATURE_ZEN6")
  2665281a07 ("x86/its: Add "vmexit" option to skip mitigation on some CPUs")
  8754e67ad4 ("x86/its: Add support for ITS-safe indirect thunk")
  159013a7ca ("x86/its: Enumerate Indirect Target Selection (ITS) bug")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

      CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
      CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Please see tools/include/uapi/README for further details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20250519214126.1652491-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-20 12:57:18 -03:00
Arnaldo Carvalho de Melo
57cdcab466 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  159013a7ca ("x86/its: Enumerate Indirect Target Selection (ITS) bug")

That cause no changes to tooling as it doesn't include a new MSR to be
captured by the tools/perf/trace/beauty/tracepoints/x86_msr.sh script,
for instance:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh | head
  static const char * const x86_MSRs[] = {
  	[0x00000000] = "IA32_P5_MC_ADDR",
  	[0x00000001] = "IA32_P5_MC_TYPE",
  	[0x00000010] = "IA32_TSC",
  	[0x00000017] = "IA32_PLATFORM_ID",
  	[0x0000001b] = "IA32_APICBASE",
  	[0x00000020] = "KNC_PERFCTR0",
  	[0x00000021] = "KNC_PERFCTR1",
  	[0x00000028] = "KNC_EVNTSEL0",
  	[0x00000029] = "KNC_EVNTSEL1",
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Please see tools/include/uapi/README for further details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250519214126.1652491-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-20 12:57:18 -03:00
Ingo Molnar
2fb8414e64 Merge branch 'x86/cpu' into x86/core, to resolve conflicts
Conflicts:
	arch/x86/kernel/cpu/bugs.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13 10:37:01 +02:00
Ingo Molnar
c1ab4ce3cb tools/arch/x86: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>
Synchronize with what we did with the kernel side header in:

  3846389c03 ("x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>")

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2025-05-06 13:45:13 +02:00
Masami Hiramatsu (Google)
4b626015e1 x86/insn: Stop decoding i64 instructions in x86-64 mode at opcode
In commit 2e044911be ("x86/traps: Decode 0xEA instructions as #UD")
FineIBT starts using 0xEA as an invalid instruction like UD2. But
insn decoder always returns the length of "0xea" instruction as 7
because it does not check the (i64) superscript.

The x86 instruction decoder should also decode 0xEA on x86-64 as
a one-byte invalid instruction by decoding the "(i64)" superscript tag.

This stops decoding instruction which has (i64) but does not have (o64)
superscript in 64-bit mode at opcode and skips other fields.

With this change, insn_decoder_test says 0xea is 1 byte length if
x86-64 (-y option means 64-bit):

   $ printf "0:\tea\t\n" | insn_decoder_test -y -v
   insn_decoder_test: success: Decoded and checked 1 instructions

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/174580490000.388420.5225447607417115496.stgit@devnote2
2025-05-06 12:03:16 +02:00
Ingo Molnar
24035886d7 Linux 6.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi
 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U
 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT
 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG
 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0
 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe
 RtLDcXg=
 =xyaI
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflicts

 Conflicts:
	tools/arch/x86/include/asm/cpufeatures.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06 10:00:58 +02:00
Pratik R. Sampat
3bf3e0a521 KVM: selftests: Add library support for interacting with SNP
Extend the SEV library to include support for SNP ioctl() wrappers,
which aid in launching and interacting with a SEV-SNP guest.

Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-8-prsampat@amd.com
[sean: use BIT()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02 12:32:33 -07:00
Yosry Ahmed
9a7cb00a8f x86/cpufeatures: Define X86_FEATURE_AMD_IBRS_SAME_MODE
Per the APM [1]:

  Some processors, identified by CPUID Fn8000_0008_EBX[IbrsSameMode]
  (bit 19) = 1, provide additional speculation limits. For these
  processors, when IBRS is set, indirect branch predictions are not
  influenced by any prior indirect branches, regardless of mode (CPL
  and guest/host) and regardless of whether the prior indirect branches
  occurred before or after the setting of IBRS. This is referred to as
  Same Mode IBRS.

Define this feature bit, which will be used by KVM to determine if an
IBPB is required on nested VM-exits in SVM.

[1] AMD64 Architecture Programmer's Manual Pub. 40332, Rev 4.08 - April
    2024, Volume 2, 3.2.9 Speculation Control MSRs

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250221163352.3818347-2-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24 11:18:30 -07:00
Xin Li (Intel)
3aba0b40ca x86/cpufeatures: Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES
Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES to X86_FEATURE_AMD_HTR_CORES
to make the last column aligned consistently in the whole file.

No functional changes.

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-4-xin@zytor.com
2025-04-15 22:09:20 +02:00
Xin Li (Intel)
13327fada7 x86/cpufeatures: Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT
Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT to
X86_FEATURE_CLEAR_BHB_VMEXIT to make the last column aligned
consistently in the whole file.

There's no need to explain in the name what the mitigation does.

No functional changes.

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-3-xin@zytor.com
2025-04-15 22:09:16 +02:00
Borislav Petkov (AMD)
282cc5b676 x86/cpufeatures: Clean up formatting
It is a special file with special formatting so remove one whitespace
damage and format newer defines like the rest.

No functional changes.

 [ Xin: Do the same to tools/arch/x86/include/asm/cpufeatures.h. ]

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-2-xin@zytor.com
2025-04-15 21:44:27 +02:00
Borislav Petkov (AMD)
dd86a1d013 x86/bugs: Remove X86_BUG_MMIO_UNKNOWN
Whack this thing because:

- the "unknown" handling is done only for this vuln and not for the
  others

- it doesn't do anything besides reporting things differently. It
  doesn't apply any mitigations - it is simply causing unnecessary
  complications to the code which don't bring anything besides
  maintenance overhead to what is already a very nasty spaghetti pile

- all the currently unaffected CPUs can also be in "unknown" status so
  there's no need for special handling here

so get rid of it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: David Kaplan <david.kaplan@amd.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250414150951.5345-1-bp@kernel.org
2025-04-14 17:15:27 +02:00
Namhyung Kim
847f1403d3 tools headers: Update the x86 headers with the kernel sources
To pick up the changes in:

  841326332b x86/cpufeatures: Generate the <asm/cpufeaturemasks.h> header based on build config
  440a65b7d2 x86/mm: Enable AMD translation cache extensions
  767ae437a3 x86/mm: Add INVLPGB feature and Kconfig entry
  b4cc466b97 cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
  98c7a713db x86/bugs: Add X86_BUG_SPECTRE_V2_USER
  8f64eee70c x86/bugs: Remove X86_FEATURE_USE_IBPB
  8442df2b49 x86/bugs: KVM: Add support for SRSO_MSR_FIX
  70792aed14 x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
  968e9bc4ce x86: move ZMM exclusion list into CPU feature flag
  c631a2de7a perf/x86/intel: Ensure LBRs are disabled when a CPU is starting
  38cc6495cd x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
  288bba2f4c x86/cpufeatures: Remove "AMD" from the comments to the AMD-specific leaf
  877818802c x86/bugs: Add SRSO_USER_KERNEL_NO support
  8ae3291f77 x86/sev: Add full support for a segmented RMP table
  0cbc025841 x86/sev: Add support for the RMPREAD instruction
  7a470e826d x86/cpufeatures: Free up unused feature bits

Addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Please see tools/include/uapi/README for further details.

Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20250410001125.391820-10-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10 09:28:25 -07:00
Namhyung Kim
ddc592972f tools headers: Update the KVM headers with the kernel sources
To pick up the changes in:

  af5366bea2 KVM: x86: Drop the now unused KVM_X86_DISABLE_VALID_EXITS
  915d2f0718 KVM: Move KVM_REG_SIZE() definition to common uAPI header
  5c17848134 KVM: x86/xen: Restrict hypercall MSR to unofficial synthetic range
  9364789567 KVM: x86: Add a VM type define for TDX
  fa662c9080 KVM: SVM: Add Idle HLT intercept support
  3adaee7830 KVM: arm64: Allow userspace to change the implementation ID registers
  faf7714a47 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
  c0000e58c7 KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2
  f83c41fb3d KVM: arm64: Allow userspace to limit NV support to nVHE

Addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
    diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20250410001125.391820-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10 09:28:24 -07:00
Linus Torvalds
906174776c - Some preparatory work to convert the mitigations machinery to mitigating
attack vectors instead of single vulnerabilities
 
 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag
 
 - Add support for a Zen5-specific SRSO mitigation
 
 - Cleanups and minor improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixS0ACgkQEsHwGGHe
 VUpi1xAAgvH2u8Eo8ibT5dABQpD65w3oQiykO+9aDpObG9w9beDVGlld8DJE61Rz
 6tcE0Clp2H/tMcCbn8zXIJ92TQ3wIX/85uZwLi1VEM1Tx7A6VtAbPv8WKfZE3FCX
 9v92HRKnK3ql+A2ZR+oyy+/8RedUmia7y7/bXH1H7Zf2uozoKkmq5cQnwfq5iU4A
 qNiKuvSlQwjZ8Zz6Ax1ugHUkE4R7mlKh8rccLXl4+mVr63/lkPHSY3OFTjcYf4HW
 Ir92N86Spfo0/l0vsOOsWoYKmoaiVP7ouJh7YbKR3B0BGN0pt2MT476mehkEs427
 m4J6XhRKhIrsYmzEkLvvpsg12zO4/PKk8BEYNS7YPYlRaOwjV4ivyFS2aY6e55rh
 yUHyo9s+16f/Mp+/fNFXll3mdMxYBioPWh3M191nJkdfyKMrtf0MdKPRibaJB8wH
 yMF4D1gMx+hFbs0/VOS6dtqD9DKW7VgPg0LW+RysfhnLTuFFb5iBcH6Of7l7Z/Ca
 vVK+JxrhB1EDVI1+MKnESKPF9c6j3DRa2xrQHi/XYje1TGqnQ1v4CmsEObYBuJDN
 9M9t4QLzNuA/DA5tS7cxxtQ3YUthuJjPLcO4EVHOCvnqCAxkzp0i3dVMUr+YISl+
 2yFqaZdTt8s8FjTI21LOyuloCo30ZLlzaorFa0lp2cIyYup+1vg=
 =btX/
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 speculation mitigation updates from Borislav Petkov:

 - Some preparatory work to convert the mitigations machinery to
   mitigating attack vectors instead of single vulnerabilities

 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag

 - Add support for a Zen5-specific SRSO mitigation

 - Cleanups and minor improvements

* tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
  x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code
  x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds
  x86/bugs: Relocate mds/taa/mmio/rfds defines
  x86/bugs: Add X86_BUG_SPECTRE_V2_USER
  x86/bugs: Remove X86_FEATURE_USE_IBPB
  KVM: nVMX: Always use IBPB to properly virtualize IBRS
  x86/bugs: Use a static branch to guard IBPB on vCPU switch
  x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()
  x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()
  x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers
  x86/bugs: KVM: Add support for SRSO_MSR_FIX
2025-03-25 13:30:18 -07:00
Linus Torvalds
e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Thomas Huth
24a295e4ef x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.

This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with UAPI headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.

This is mostly a mechanical patch (done with a simple "sed -i"
statement), with some manual tweaks in <asm/frame.h>, <asm/hw_irq.h>
and <asm/setup.h> that mentioned this macro in comments with some
missing underscores.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314071013.1575167-38-thuth@redhat.com
2025-03-19 11:47:30 +01:00
Xin Li (Intel)
8f97566c8a x86/cpufeatures: Remove {disabled,required}-features.h
The functionalities of {disabled,required}-features.h have been replaced with
the auto-generated generated/<asm/cpufeaturemasks.h> header.

Thus they are no longer needed and can be removed.

None of the macros defined in {disabled,required}-features.h is used in tools,
delete them too.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250305184725.3341760-4-xin@zytor.com
2025-03-19 11:15:12 +01:00
Rik van Riel
440a65b7d2 x86/mm: Enable AMD translation cache extensions
With AMD TCE (translation cache extensions) only the intermediate mappings
that cover the address range zapped by INVLPG / INVLPGB get invalidated,
rather than all intermediate mappings getting zapped at every TLB invalidation.

This can help reduce the TLB miss rate, by keeping more intermediate mappings
in the cache.

From the AMD manual:

Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to
1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB
entries. When this bit is 0, these instructions remove the target PTE from the
TLB as well as all upper-level table entries that are cached in the TLB,
whether or not they are associated with the target PTE.  When this bit is set,
these instructions will remove the target PTE and only those upper-level
entries that lead to the target PTE in the page table hierarchy, leaving
unrelated upper-level entries intact.

  [ bp: use cpu_has()... I know, it is a mess. ]

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226030129.530345-13-riel@surriel.com
2025-03-19 11:12:29 +01:00
H. Peter Anvin (Intel)
909639aa58 x86/cpufeatures: Rename X86_CMPXCHG64 to X86_CX8
Replace X86_CMPXCHG64 with X86_CX8, as CX8 is the name of the CPUID
flag, thus to make it consistent with X86_FEATURE_CX8 defined in
<asm/cpufeatures.h>.

No functional change intended.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250228082338.73859-2-xin@zytor.com
2025-02-28 11:42:34 +01:00
Yosry Ahmed
8f64eee70c x86/bugs: Remove X86_FEATURE_USE_IBPB
X86_FEATURE_USE_IBPB was introduced in:

  2961298efe ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")

to have separate flags for when the CPU supports IBPB (i.e. X86_FEATURE_IBPB)
and when an IBPB is actually used to mitigate Spectre v2.

Ever since then, the uses of IBPB expanded. The name became confusing
because it does not control all IBPB executions in the kernel.
Furthermore, because its name is generic and it's buried within
indirect_branch_prediction_barrier(), it's easy to use it not knowing
that it is specific to Spectre v2.

X86_FEATURE_USE_IBPB is no longer needed because all the IBPB executions
it used to control are now controlled through other means (e.g.
switch_mm_*_ibpb static branches).

Remove the unused feature bit.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20250227012712.3193063-7-yosry.ahmed@linux.dev
2025-02-27 10:57:21 +01:00
Ravi Bangoria
3201bfa368 perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
Sync load latency related bit fields into the tool's header copy

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250205060547.1337-4-ravi.bangoria@amd.com
2025-02-17 15:20:05 +01:00
Namhyung Kim
e120829dbf tools headers: Sync x86 kvm and cpufeature headers with the kernel
To pick up the changes in this cset:

  a0423af92c ("x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest")
  0c487010cb ("x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit")
  1ad4667066 ("x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES")
  104edc6efc ("x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix")
  3ea87dfa31 ("x86/cpufeatures: Add a IBPB_NO_RET BUG flag")
  ff898623af ("x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET")
  dcb988cdac ("KVM: x86: Quirk initialization of feature MSRs to KVM's max configuration")

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20241203035349.1901262-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-12-04 14:34:49 -08:00
Linus Torvalds
d8d78a90e7 - Add a feature flag which denotes AMD CPUs supporting workload classification
with the purpose of using such hints when making scheduling decisions
 
 - Determine the boost enumerator for each AMD core based on its type: efficiency
   or performance, in the cppc driver
 
 - Add the type of a CPU to the topology CPU descriptor with the goal of
   supporting and making decisions based on the type of the respective core
 
 - Add a feature flag to denote AMD cores which have heterogeneous topology and
   enable SD_ASYM_PACKING for those
 
 - Check microcode revisions before disabling PCID on Intel
 
 - Cleanups and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7q0UACgkQEsHwGGHe
 VUq27Q//TADIn/rZj95OuWLYFXduOpzdyfF6BAOabRjUpIWTGJ5YdKjj1TCA2wUE
 6SiHZWQxQropB3NgeICcDT+3OGdGzE2qywzpXspUDsBPraWx+9CA56qREYafpRps
 88ZQZJWHla2/0kHN5oM4fYe05mWMLAFgIhG4tPH/7sj54Zqar40nhVksz3WjKAid
 yEfzbdVeRI5sNoujyHzGANXI0Fo98nAyi5Qj9kXL9W/UV1JmoQ78Rq7V9IIgOBsc
 l6Gv/h0CNtH9voqfrfUb07VHk8ZqSJ37xUnrnKdidncWGCWEAoZRr7wU+I9CHKIs
 tzdx+zq6JC3YN0IwsZCjk4me+BqVLJxW2oDgW7esPifye6ElyEo4T9UO9LEpE1qm
 ReAByoIMdSXWwXuITwy4NxLPKPCpU7RyJCiqFzpJp0g4qUq2cmlyERDirf6eknXL
 s+dmRaglEdcQT/EL+Y+vfFdQtLdwJmOu+nPPjjFxeRcIDB+u1sXJMEFbyvkLL6FE
 HOdNxL+5n/3M8Lbh77KIS5uCcjXL2VCkZK2/hyoifUb+JZR/ENoqYjElkMXOplyV
 KQIfcTzVCLRVvZApf/MMkTO86cpxMDs7YLYkgFxDsBjRdoq/Mzub8yzWn6kLZtmP
 ANNH4uYVtjrHE1nxJSA0JgYQlJKYeNU5yhLiTLKhHL5BwDYfiz8=
 =420r
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Add a feature flag which denotes AMD CPUs supporting workload
   classification with the purpose of using such hints when making
   scheduling decisions

 - Determine the boost enumerator for each AMD core based on its type:
   efficiency or performance, in the cppc driver

 - Add the type of a CPU to the topology CPU descriptor with the goal of
   supporting and making decisions based on the type of the respective
   core

 - Add a feature flag to denote AMD cores which have heterogeneous
   topology and enable SD_ASYM_PACKING for those

 - Check microcode revisions before disabling PCID on Intel

 - Cleanups and fixlets

* tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
  x86/cpu: Fix FAM5_QUARK_X1000 to use X86_MATCH_VFM()
  x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
  x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
  x86/amd: Use heterogeneous core topology for identifying boost numerator
  x86/cpu: Add CPU type to struct cpuinfo_topology
  x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
  x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
  x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
  x86/mm: Don't disable PCID when INVLPG has been fixed by microcode
2024-11-19 12:27:19 -08:00
Ian Rogers
a5384c4267 perf cap: Add __NR_capget to arch/x86 unistd
As there are duplicated kernel headers in tools/include libc can pick
up the wrong definitions. This was causing the wrong system call for
capget in perf.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: e25ebda78e ("perf cap: Tidy up and improve capability testing")
Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28 13:04:52 -03:00
Mario Limonciello
104edc6efc x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
This feature is an AMD unique feature of some processors, so put
AMD into the name.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-2-mario.limonciello@amd.com
2024-10-25 20:09:16 +02:00
Arnaldo Carvalho de Melo
08a7d25255 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  dc1e67f70f ("KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h")
  d7bfc9ffd5 ("KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.h")
  beb2e44604 ("x86/cpu: KVM: Move macro to encode PAT value to common header")
  e7e80b66fb ("x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

To see how this works take a look at this previous update:

  https://git.kernel.org/torvalds/c/174372668933ede5

  1743726689 ("tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING")

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Please see tools/include/uapi/README for further details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin3.li@intel.com>
Link: https://lore.kernel.org/lkml/ZxpLSBzGin3vjs3b@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-24 10:27:59 -03:00
Arnaldo Carvalho de Melo
d822ca29a4 tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  aa8d1f48d3 ("KVM: x86/mmu: Introduce a quirk to control memslot zap behavior")

That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/lkml/ZxgN0O02YrAJ2qIC@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo
744a6a1f2a tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  0a3e4e94d1 ("platform/x86/intel/ifs: Add SBAF test image loading support")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jithu Joseph <jithu.joseph@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrJY68Btx3a_yV4@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 17:23:38 -03:00
Namhyung Kim
f6d9883f8e tools/include: Sync x86 headers with the kernel sources
To pick up changes from:

  149fd4712b perf/x86/intel: Support Perfmon MSRs aliasing
  21b362cc76 x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
  4f460bff7b cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
  7ea81936b8 x86/cpufeatures: Add HWP highest perf change feature flag
  78ce84b9e0 x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
  1beb348d5c x86/sev: Provide SVSM discovery support

This should be used to beautify x86 syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-07 10:59:07 -07:00
Namhyung Kim
a625df3995 tools/include: Sync uapi/linux/kvm.h with the kernel sources
And other arch-specific UAPI headers to pick up changes from:

  4b23e0c199 KVM: Ensure new code that references immediate_exit gets extra scrutiny
  85542adb65 KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
  6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
  34ff659017 x86/sev: Use kernel provided SVSM Calling Areas
  5dcc1e7614 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
  9a0d2f4995 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
  e9eb790b25 KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
  1a1e6865f5 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register

This should be used to beautify KVM syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
  diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06 12:30:32 -07:00
Arnaldo Carvalho de Melo
88e520512a tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  4af663c2f6 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
  4f5defae70 ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
  26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
  ac5c48027b ("KVM: SEV: publish supported VMSA features")
  651d61bc8b ("KVM: PPC: Fix documentation for ppc mmu caps")

That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 16:49:36 -03:00
Arnaldo Carvalho de Melo
ac4b069035 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  53bc516ade ("x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place")

That patch just move definitions around, so this just silences this perf
build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/lkml/ZlYe8jOzd1_DyA7X@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 15:14:32 -03:00
Linus Torvalds
29c73fc794 perf tools fixes and improvements for v6.10:
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
 
 - Add support for using the 'capstone' disassembler library in various tools,
   such as 'perf script' and 'perf annotate'. This is an alternative for the
   use of the 'xed' and 'objdump' disassemblers.
 
 - Data-type profiling improvements:
 
   Resolve types for a->b->c by backtracking the assignments until it finds
   DWARF info for one of those members
 
   Support for global variables, keeping a cache to speed up lookups.
 
   Handle the 'call' instruction, dealing with effects on registers and handling
   its return when tracking register data types.
 
   Handle x86's segment based addressing like %gs:0x28, to support things like
   per CPU variables, the stack canary, etc.
 
   Data-type profiling got big speedups when using capstone for disassembling.
   The objdump outoput parsing method is left as a fallback when capstone fails or
   isn't available. There are patches posted for 6.11 that to use a LLVM
   disassembler.
 
   Support event group display in the TUI when annotating types with --data-type,
   for instance to show memory load and store events for the data type fields.
 
   Optimize the 'perf annotate' data structures, reducing memory usage.
 
   Add a initial 'perf test' for 'perf annotate', checking that a target symbol
   appears on the output, specifying objdump via the command line, etc.
 
 - Integrate the shellcheck utility with the build of perf to allow catching
   shell problems early in areas such as 'perf test', 'perf trace' scrape
   scripts, etc.
 
 - Add 'uretprobe' variant in the 'perf bench uprobe' tool.
 
 - Add script to run instances of 'perf script' in parallel.
 
 - Allow parsing tracepoint names that start with digits, such as
   9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
   where those tracepoints aren't available.
 
 Vendor Events:
 
 - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
   Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
   Sky Lake and Snow Ridge X.  Remove info metrics erroneously in TopdownL1.
 
 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the
   "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
   document, with events that capture information on op dispatch, execution and
   retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
 
 - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
 
 Miscellaneous:
 
 - Sync header copies with the kernel sources.
 
 - Move some header copies used only for generating translation string tables
   for ioctl cmds and other syscall integer arguments to a new directory under
   tools/perf/beauty/, to separate from copies in tools/include/ that are used
   to build the tools.
 
 - Introduce scrape script for several syscall 'flags'/'mask' arguments.
 
 - Improve cpumap utilization, fixing up pairing of refcounts, using the right
   iterators (perf_cpu_map__for_each_cpu), etc.
 
 - Give more details about raw event encodings in 'perf list', show tracepoint
   encoding in the detailed output.
 
 - Refactor the DSOs handling code, reducing memory usage.
 
 - Document the BPF event modifier and add a 'perf test' for it.
 
 - Improve the event parser, better error messages and add further 'perf test's
   for it.
 
 - Add reference count checking to 'struct comm_str' and 'struct mem_info'.
 
 - Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
 
 - Tweak the ARM64's Coresight 'perf test's.
 
 - Improve ARM64's CoreSight ETM version detection and error reporting.
 
 - Fix handling of symbols when using kcore.
 
 - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
   machines in 'perf report'.
 
 - Fix -g/--call-graph option failure in 'perf sched timehist'.
 
 - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
   installed in non-standard directories, such as when doing cross builds.
 
 - Various 'perf test' and 'perf bench' fixes.
 
 - Improve 'perf probe' error message for long C++ probe names.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+
 J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p
 HQxKXehJRh90I90yn954mK2NuN+58Q0=
 =sie+
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "General:

   - Integrate the shellcheck utility with the build of perf to allow
     catching shell problems early in areas such as 'perf test', 'perf
     trace' scrape scripts, etc

   - Add 'uretprobe' variant in the 'perf bench uprobe' tool

   - Add script to run instances of 'perf script' in parallel

   - Allow parsing tracepoint names that start with digits, such as
     9p/9p_client_req, etc. Make sure 'perf test' tests it even on
     systems where those tracepoints aren't available

   - Add Kan Liang to MAINTAINERS as a perf tools reviewer

   - Add support for using the 'capstone' disassembler library in
     various tools, such as 'perf script' and 'perf annotate'. This is
     an alternative for the use of the 'xed' and 'objdump' disassemblers

  Data-type profiling improvements:

   - Resolve types for a->b->c by backtracking the assignments until it
     finds DWARF info for one of those members

   - Support for global variables, keeping a cache to speed up lookups

   - Handle the 'call' instruction, dealing with effects on registers
     and handling its return when tracking register data types

   - Handle x86's segment based addressing like %gs:0x28, to support
     things like per CPU variables, the stack canary, etc

   - Data-type profiling got big speedups when using capstone for
     disassembling. The objdump outoput parsing method is left as a
     fallback when capstone fails or isn't available. There are patches
     posted for 6.11 that to use a LLVM disassembler

   - Support event group display in the TUI when annotating types with
     --data-type, for instance to show memory load and store events for
     the data type fields

   - Optimize the 'perf annotate' data structures, reducing memory usage

   - Add a initial 'perf test' for 'perf annotate', checking that a
     target symbol appears on the output, specifying objdump via the
     command line, etc

  Vendor Events:

   - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
     Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
     Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
     erroneously in TopdownL1

   - Add AMD's Zen 5 core and uncore events and metrics. Those come from
     the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
     Processors" document, with events that capture information on op
     dispatch, execution and retirement, branch prediction, L1 and L2
     cache activity, TLB activity, etc

   - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
     AmpereOneX

  Miscellaneous:

   - Sync header copies with the kernel sources

   - Move some header copies used only for generating translation string
     tables for ioctl cmds and other syscall integer arguments to a new
     directory under tools/perf/beauty/, to separate from copies in
     tools/include/ that are used to build the tools

   - Introduce scrape script for several syscall 'flags'/'mask'
     arguments

   - Improve cpumap utilization, fixing up pairing of refcounts, using
     the right iterators (perf_cpu_map__for_each_cpu), etc

   - Give more details about raw event encodings in 'perf list', show
     tracepoint encoding in the detailed output

   - Refactor the DSOs handling code, reducing memory usage

   - Document the BPF event modifier and add a 'perf test' for it

   - Improve the event parser, better error messages and add further
     'perf test's for it

   - Add reference count checking to 'struct comm_str' and 'struct
     mem_info'

   - Make ARM64's 'perf test' entries for the Neoverse N1 more robust

   - Tweak the ARM64's Coresight 'perf test's

   - Improve ARM64's CoreSight ETM version detection and error reporting

   - Fix handling of symbols when using kcore

   - Fix PAI (Processor Activity Instrumentation) counter names for s390
     virtual machines in 'perf report'

   - Fix -g/--call-graph option failure in 'perf sched timehist'

   - Add LIBTRACEEVENT_DIR build option to allow building with
     libtraceevent installed in non-standard directories, such as when
     doing cross builds

   - Various 'perf test' and 'perf bench' fixes

   - Improve 'perf probe' error message for long C++ probe names"

* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
  tools lib subcmd: Show parent options in help
  perf pmu: Count sys and cpuid JSON events separately
  perf stat: Don't display metric header for non-leader uncore events
  perf annotate-data: Ensure the number of type histograms
  perf annotate: Fix segfault on sample histogram
  perf daemon: Fix file leak in daemon_session__control
  libsubcmd: Fix parse-options memory leak
  perf lock: Avoid memory leaks from strdup()
  perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
  perf tools: Ignore deleted cgroups
  perf parse: Allow tracepoint names to start with digits
  perf parse-events: Add new 'fake_tp' parameter for tests
  perf parse-events: pass parse_state to add_tracepoint
  perf symbols: Fix ownership of string in dso__load_vmlinux()
  perf symbols: Update kcore map before merging in remaining symbols
  perf maps: Re-use __maps__free_maps_by_name()
  perf symbols: Remove map from list before updating addresses
  perf tracepoint: Don't scan all tracepoints to test if one exists
  perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
  perf thread: Fixes to thread__new() related to initializing comm
  ...
2024-05-21 15:45:14 -07:00
Adrian Hunter
87bbaf1a4b x86/insn: Add support for APX EVEX to the instruction decoder logic
Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:

 - extended general purpose registers (EGPRs) i.e. r16 to r31
 - Push-Pop Acceleration (PPX) hints
 - new data destination (NDD) register
 - suppress status flags writes (NF) of common instructions
 - new instructions

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits

Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
2024-05-02 13:13:45 +02:00
Adrian Hunter
eada38d575 x86/insn: Add support for REX2 prefix to the instruction decoder logic
Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.

The REX2 prefix is effectively an extended version of the REX prefix.

REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.

REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.

Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).

Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
2024-05-02 13:13:44 +02:00
Arnaldo Carvalho de Melo
8f21164321 tools headers x86 cpufeatures: Sync with the kernel sources to pick BHI mitigation changes
To pick the changes from:

  95a6ccbdc7 ("x86/bhi: Mitigate KVM by default")
  ec9404e40e ("x86/bhi: Add BHI mitigation knob")
  be482ff950 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
  0f4a837615 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")
  7390db8aea ("x86/bhi: Add support for clearing branch history at syscall entry")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

      CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
      CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZirIx4kPtJwGFZS0@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:13:10 -03:00
Arnaldo Carvalho de Melo
b29781afae tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  be482ff950 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
  0f4a837615 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.before
  $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ make -C tools/perf O=/tmp/build/perf-tools-next
  <SNIP>
  CC      /tmp/build/perf-tools-next/trace/beauty/tracepoints/x86_msr.o
  <SNIP>
  CC      /tmp/build/perf-tools-next/util/amd-sample-raw.o
  <SNIP>
  $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.after
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.after
  $ diff -u x86_msr.before x86_msr.after
  $ diff -u amd-sample-raw.o.before amd-sample-raw.o.after

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZifCnEZFx5MZQuIW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-23 14:16:08 -03:00
Arnaldo Carvalho de Melo
173b0b5b0e Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes sent via perf-tools, by Namhyung Kim.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-22 13:35:18 -03:00
Namhyung Kim
c781a72f9d tools/include: Sync x86 asm/msr-index.h with the kernel sources
To pick up the changes from:

  8076fcde01 ("x86/rfds: Mitigate Register File Data Sampling (RFDS)")
  d7b69b590b ("x86/sev: Dump SEV_STATUS")
  cd6df3f378 ("x86/cpu: Add MSR numbers for FRED configuration")
  216d106c7f ("x86/sev: Add SEV-SNP host initialization support")

This should address these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-8-namhyung@kernel.org
2024-04-11 10:38:29 -07:00
Namhyung Kim
978f2a60dd tools/include: Sync x86 asm/irq_vectors.h with the kernel sources
To pick up the changes from:

  0cbca1bf44 ("x86: irq: unconditionally define KVM interrupt vectors")

This should address these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-7-namhyung@kernel.org
2024-04-11 10:38:29 -07:00
Namhyung Kim
58e1b92df4 tools/include: Sync x86 CPU feature headers with the kernel sources
To pick up the changes from:

  598c2fafc0 ("perf/x86/amd/lbr: Use freeze based on availability")
  7f274e609f ("x86/cpufeatures: Add new word for scattered features")

This should address these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
    diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-6-namhyung@kernel.org
2024-04-11 10:38:29 -07:00
Namhyung Kim
bee3b820c6 tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
To pick up the changes from:

  6bda055d62 ("KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally")
  5d9cb71642 ("KVM: arm64: move ARM-specific defines to uapi/asm/kvm.h")
  71cd774ad2 ("KVM: s390: move s390-specific structs to uapi/asm/kvm.h")
  d750951c9e ("KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h")
  bcac047727 ("KVM: x86: move x86-specific structs to uapi/asm/kvm.h")
  c0a411904e ("KVM: remove more traces of device assignment UAPI")
  f3c80061c0 ("KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP")

That should be used to beautify the KVM arguments and it addresses these
tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
    diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
    diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
    diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-4-namhyung@kernel.org
2024-04-11 10:38:28 -07:00
Arnaldo Carvalho de Melo
eb01fe7abb perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used
just for scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
c8bfe3fad4 perf beauty: Move arch/x86/include/asm/irq_vectors.h copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Linus Torvalds
4f712ee0cb S390:
* Changes to FPU handling came in via the main s390 pull request
 
 * Only deliver to the guest the SCLP events that userspace has
   requested.
 
 * More virtual vs physical address fixes (only a cleanup since
   virtual and physical address spaces are currently the same).
 
 * Fix selftests undefined behavior.
 
 x86:
 
 * Fix a restriction that the guest can't program a PMU event whose
   encoding matches an architectural event that isn't included in the
   guest CPUID.  The enumeration of an architectural event only says
   that if a CPU supports an architectural event, then the event can be
   programmed *using the architectural encoding*.  The enumeration does
   NOT say anything about the encoding when the CPU doesn't report support
   the event *in general*.  It might support it, and it might support it
   using the same encoding that made it into the architectural PMU spec.
 
 * Fix a variety of bugs in KVM's emulation of RDPMC (more details on
   individual commits) and add a selftest to verify KVM correctly emulates
   RDMPC, counter availability, and a variety of other PMC-related
   behaviors that depend on guest CPUID and therefore are easier to
   validate with selftests than with custom guests (aka kvm-unit-tests).
 
 * Zero out PMU state on AMD if the virtual PMU is disabled, it does not
   cause any bug but it wastes time in various cases where KVM would check
   if a PMC event needs to be synthesized.
 
 * Optimize triggering of emulated events, with a nice ~10% performance
   improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
   guest.
 
 * Tighten the check for "PMI in guest" to reduce false positives if an NMI
   arrives in the host while KVM is handling an IRQ VM-Exit.
 
 * Fix a bug where KVM would report stale/bogus exit qualification information
   when exiting to userspace with an internal error exit code.
 
 * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
 
 * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
   read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
 
 * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB).  KVM
   doesn't support yielding in the middle of processing a zap, and 1GiB
   granularity resulted in multi-millisecond lags that are quite impolite
   for CONFIG_PREEMPT kernels.
 
 * Allocate write-tracking metadata on-demand to avoid the memory overhead when
   a kernel is built with i915 virtualization support but the workloads use
   neither shadow paging nor i915 virtualization.
 
 * Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives.
 
 * Fix the debugregs ABI for 32-bit KVM.
 
 * Rework the "force immediate exit" code so that vendor code ultimately decides
   how and when to force the exit, which allowed some optimization for both
   Intel and AMD.
 
 * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, causing extra unnecessary work.
 
 * Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
 
 * Harden against underflowing the active mmu_notifier invalidation
   count, so that "bad" invalidations (usually due to bugs elsehwere in the
   kernel) are detected earlier and are less likely to hang the kernel.
 
 x86 Xen emulation:
 
 * Overlay pages can now be cached based on host virtual address,
   instead of guest physical addresses.  This removes the need to
   reconfigure and invalidate the cache if the guest changes the
   gpa but the underlying host virtual address remains the same.
 
 * When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.
 
 * Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).
 
 * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.
 
 RISC-V:
 
 * Support exception and interrupt handling in selftests
 
 * New self test for RISC-V architectural timer (Sstc extension)
 
 * New extension support (Ztso, Zacas)
 
 * Support userspace emulation of random number seed CSRs.
 
 ARM:
 
 * Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers
 
 * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it
 
 * Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path
 
 * Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register
 
 * Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
 
 LoongArch:
 
 * Set reserved bits as zero in CPUCFG.
 
 * Start SW timer only when vcpu is blocking.
 
 * Do not restart SW timer when it is expired.
 
 * Remove unnecessary CSR register saving during enter guest.
 
 * Misc cleanups and fixes as usual.
 
 Generic:
 
 * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
   true on all architectures except MIPS (where Kconfig determines the
   available depending on CPU capabilities).  It is replaced either by
   an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
   everywhere else.
 
 * Factor common "select" statements in common code instead of requiring
   each architecture to specify it
 
 * Remove thoroughly obsolete APIs from the uapi headers.
 
 * Move architecture-dependent stuff to uapi/asm/kvm.h
 
 * Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
 
 * Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, so that there's no need to remember
   to *conditionally* clean up after the worker.
 
 Selftests:
 
 * Reduce boilerplate especially when utilize selftest TAP infrastructure.
 
 * Add basic smoke tests for SEV and SEV-ES, along with a pile of library
   support for handling private/encrypted/protected memory.
 
 * Fix benign bugs where tests neglect to close() guest_memfd files.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
 eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
 j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
 Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
 =mqOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00
Linus Torvalds
685d982112 Core x86 changes for v6.9:
- The biggest change is the rework of the percpu code,
   to support the 'Named Address Spaces' GCC feature,
   by Uros Bizjak:
 
    - This allows C code to access GS and FS segment relative
      memory via variables declared with such attributes,
      which allows the compiler to better optimize those accesses
      than the previous inline assembly code.
 
    - The series also includes a number of micro-optimizations
      for various percpu access methods, plus a number of
      cleanups of %gs accesses in assembly code.
 
    - These changes have been exposed to linux-next testing for
      the last ~5 months, with no known regressions in this area.
 
 - Fix/clean up __switch_to()'s broken but accidentally
   working handling of FPU switching - which also generates
   better code.
 
 - Propagate more RIP-relative addressing in assembly code,
   to generate slightly better code.
 
 - Rework the CPU mitigations Kconfig space to be less idiosyncratic,
   to make it easier for distros to follow & maintain these options.
 
 - Rework the x86 idle code to cure RCU violations and
   to clean up the logic.
 
 - Clean up the vDSO Makefile logic.
 
 - Misc cleanups and fixes.
 
 [ Please note that there's a higher number of merge commits in
   this branch (three) than is usual in x86 topic trees. This happened
   due to the long testing lifecycle of the percpu changes that
   involved 3 merge windows, which generated a longer history
   and various interactions with other core x86 changes that we
   felt better about to carry in a single branch. ]
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvB0gRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jUqRAAqnEQPiabF5acQlHrwviX+cjSobDlqtH5
 9q2AQy9qaEHapzD0XMOxvFye6XIvehGOGxSPvk6CoviSxBND8rb56lvnsEZuLeBV
 Bo5QSIL2x42Zrvo11iPHwgXZfTIusU90sBuKDRFkYBAxY3HK2naMDZe8MAsYCUE9
 nwgHF8DDc/NYiSOXV8kosWoWpNIkoK/STyH5bvTQZMqZcwyZ49AIeP1jGZb/prbC
 e/rbnlrq5Eu6brpM7xo9kELO0Vhd34urV14KrrIpdkmUKytW2KIsyvW8D6fqgDBj
 NSaQLLcz0pCXbhF+8Nqvdh/1coR4L7Ymt08P1rfEjCsQgb/2WnSAGUQuC5JoGzaj
 ngkbFcZllIbD9gNzMQ1n4Aw5TiO+l9zxCqPC/r58Uuvstr+K9QKlwnp2+B3Q73Ft
 rojIJ04NJL6lCHdDgwAjTTks+TD2PT/eBWsDfJ/1pnUWttmv9IjMpnXD5sbHxoiU
 2RGGKnYbxXczYdq/ALYDWM6JXpfnJZcXL3jJi0IDcCSsb92xRvTANYFHnTfyzGfw
 EHkhbF4e4Vy9f6QOkSP3CvW5H26BmZS9DKG0J9Il5R3u2lKdfbb5vmtUmVTqHmAD
 Ulo5cWZjEznlWCAYSI/aIidmBsp9OAEvYd+X7Z5SBIgTfSqV7VWHGt0BfA1heiVv
 F/mednG0gGc=
 =3v4F
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:

 - The biggest change is the rework of the percpu code, to support the
   'Named Address Spaces' GCC feature, by Uros Bizjak:

      - This allows C code to access GS and FS segment relative memory
        via variables declared with such attributes, which allows the
        compiler to better optimize those accesses than the previous
        inline assembly code.

      - The series also includes a number of micro-optimizations for
        various percpu access methods, plus a number of cleanups of %gs
        accesses in assembly code.

      - These changes have been exposed to linux-next testing for the
        last ~5 months, with no known regressions in this area.

 - Fix/clean up __switch_to()'s broken but accidentally working handling
   of FPU switching - which also generates better code

 - Propagate more RIP-relative addressing in assembly code, to generate
   slightly better code

 - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
   make it easier for distros to follow & maintain these options

 - Rework the x86 idle code to cure RCU violations and to clean up the
   logic

 - Clean up the vDSO Makefile logic

 - Misc cleanups and fixes

* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/idle: Select idle routine only once
  x86/idle: Let prefer_mwait_c1_over_halt() return bool
  x86/idle: Cleanup idle_setup()
  x86/idle: Clean up idle selection
  x86/idle: Sanitize X86_BUG_AMD_E400 handling
  sched/idle: Conditionally handle tick broadcast in default_idle_call()
  x86: Increase brk randomness entropy for 64-bit systems
  x86/vdso: Move vDSO to mmap region
  x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
  x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
  x86/retpoline: Ensure default return thunk isn't used at runtime
  x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
  x86/vdso: Use $(addprefix ) instead of $(foreach )
  x86/vdso: Simplify obj-y addition
  x86/vdso: Consolidate targets and clean-files
  x86/bugs: Rename CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK
  x86/bugs: Rename CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
  x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
  x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
  x86/bugs: Rename CONFIG_SLS                  => CONFIG_MITIGATION_SLS
  ...
2024-03-11 19:53:15 -07:00
Linus Torvalds
38b334fc76 - Add the x86 part of the SEV-SNP host support. This will allow the
kernel to be used as a KVM hypervisor capable of running SNP (Secure
   Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal
   of the AMD confidential computing side, providing the most
   comprehensive confidential computing environment up to date.
 
   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the next
   cycle.
 
 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel
 
 - The usual set of fixes, cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvH0wACgkQEsHwGGHe
 VUrzmA//VS/n6dhHRnm/nAGngr4PeegkgV1OhyKYFfiZ272rT6P9QvblQrgcY0dc
 Ij1DOhEKlke51pTHvMOQ33B3P4Fuc0mx3dpCLY0up5V26kzQiKCjRKEkC4U1bcw8
 W4GqMejaR89bE14bYibmwpSib9T/uVsV65eM3xf1iF5UvsnoUaTziymDoy+nb43a
 B1pdd5vcl4mBNqXeEvt0qjg+xkMLpWUI9tJDB8mbMl/cnIFGgMZzBaY8oktHSROK
 QpuUnKegOgp1RXpfLbNjmZ2Q4Rkk4MNazzDzWq3EIxaRjXL3Qp507ePK7yeA2qa0
 J3jCBQc9E2j7lfrIkUgNIzOWhMAXM2YH5bvH6UrIcMi1qsWJYDmkp2MF1nUedjdf
 Wj16/pJbeEw1aKKIywJGwsmViSQju158vY3SzXG83U/A/Iz7zZRHFmC/ALoxZptY
 Bi7VhfcOSpz98PE3axnG8CvvxRDWMfzBr2FY1VmQbg6VBNo1Xl1aP/IH1I8iQNKg
 /laBYl/qP+1286TygF1lthYROb1lfEIJprgi2xfO6jVYUqPb7/zq2sm78qZRfm7l
 25PN/oHnuidfVfI/H3hzcGubjOG9Zwra8WWYBB2EEmelf21rT0OLqq+eS4T6pxFb
 GNVfc0AzG77UmqbrpkAMuPqL7LrGaSee4NdU3hkEdSphlx1/YTo=
 =c1ps
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Add the x86 part of the SEV-SNP host support.

   This will allow the kernel to be used as a KVM hypervisor capable of
   running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
   is the ultimate goal of the AMD confidential computing side,
   providing the most comprehensive confidential computing environment
   up to date.

   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the
   next cycle.

 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel

 - The usual set of fixes, cleanups and improvements all over the place

* tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/sev: Disable KMSAN for memory encryption TUs
  x86/sev: Dump SEV_STATUS
  crypto: ccp - Have it depend on AMD_IOMMU
  iommu/amd: Fix failure return from snp_lookup_rmpentry()
  x86/sev: Fix position dependent variable references in startup code
  crypto: ccp: Make snp_range_list static
  x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  Documentation: virt: Fix up pre-formatted text block for SEV ioctls
  crypto: ccp: Add the SNP_SET_CONFIG command
  crypto: ccp: Add the SNP_COMMIT command
  crypto: ccp: Add the SNP_PLATFORM_STATUS command
  x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
  KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
  crypto: ccp: Handle legacy SEV commands when SNP is enabled
  crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  x86/sev: Introduce an SNP leaked pages list
  crypto: ccp: Provide an API to issue SEV and SNP commands
  ...
2024-03-11 17:44:11 -07:00
Linus Torvalds
720c857907 Support for x86 Fast Return and Event Delivery (FRED):
FRED is a replacement for IDT event delivery on x86 and addresses most of
 the technical nightmares which IDT exposes:
 
  1) Exception cause registers like CR2 need to be manually preserved in
     nested exception scenarios.
 
  2) Hardware interrupt stack switching is suboptimal for nested exceptions
     as the interrupt stack mechanism rewinds the stack on each entry which
     requires a massive effort in the low level entry of #NMI code to handle
     this.
 
  3) No hardware distinction between entry from kernel or from user which
     makes establishing kernel context more complex than it needs to be
     especially for unconditionally nestable exceptions like NMI.
 
  4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a
     problem when the perf NMI takes a fault when collecting a stack trace.
 
  5) Partial restore of ESP when returning to a 16-bit segment
 
  6) Limitation of the vector space which can cause vector exhaustion on
     large systems.
 
  7) Inability to differentiate NMI sources
 
 FRED addresses these shortcomings by:
 
  1) An extended exception stack frame which the CPU uses to save exception
     cause registers. This ensures that the meta information for each
     exception is preserved on stack and avoids the extra complexity of
     preserving it in software.
 
  2) Hardware interrupt stack switching is non-rewinding if a nested
     exception uses the currently interrupt stack.
 
  3) The entry points for kernel and user context are separate and GS BASE
     handling which is required to establish kernel context for per CPU
     variable access is done in hardware.
 
  4) NMIs are now nesting protected. They are only reenabled on the return
     from NMI.
 
  5) FRED guarantees full restore of ESP
 
  6) FRED does not put a limitation on the vector space by design because it
     uses a central entry points for kernel and user space and the CPUstores
     the entry type (exception, trap, interrupt, syscall) on the entry stack
     along with the vector number. The entry code has to demultiplex this
     information, but this removes the vector space restriction.
 
     The first hardware implementations will still have the current
     restricted vector space because lifting this limitation requires
     further changes to the local APIC.
 
  7) FRED stores the vector number and meta information on stack which
     allows having more than one NMI vector in future hardware when the
     required local APIC changes are in place.
 
 The series implements the initial FRED support by:
 
  - Reworking the existing entry and IDT handling infrastructure to
    accomodate for the alternative entry mechanism.
 
  - Expanding the stack frame to accomodate for the extra 16 bytes FRED
    requires to store context and meta information
 
  - Providing FRED specific C entry points for events which have information
    pushed to the extended stack frame, e.g. #PF and #DB.
 
  - Providing FRED specific C entry points for #NMI and #MCE
 
  - Implementing the FRED specific ASM entry points and the C code to
    demultiplex the events
 
  - Providing detection and initialization mechanisms and the necessary
    tweaks in context switching, GS BASE handling etc.
 
 The FRED integration aims for maximum code reuse vs. the existing IDT
 implementation to the extent possible and the deviation in hot paths like
 context switching are handled with alternatives to minimalize the
 impact. The low level entry and exit paths are seperate due to the extended
 stack frame and the hardware based GS BASE swichting and therefore have no
 impact on IDT based systems.
 
 It has been extensively tested on existing systems and on the FRED
 simulation and as of now there are know outstanding problems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuKPgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWyUEACevJMHU+Ot9zqBPizSWxByM1uunHbp
 bjQXhaFeskd3mt7k7HU6GsPRSmC3q4lliP1Y9ypfbU0DvYSI2h/PhMWizjhmot2y
 nIvFpl51r/NsI+JHx1oXcFetz0eGHEqBui/4YQ/swgOCMymYgfqgHhazXTdldV3g
 KpH9/8W3AeGvw79uzXFH9tjBzTkbvywpam3v0LYNDJWTCuDkilyo8PjhsgRZD4x3
 V9f1nLD7nSHZW8XLoktdJJ38bKwI2Lhao91NQ0ErwopekA4/9WphZEKsDpidUSXJ
 sn1O148oQ8X92IO2OaQje8XC5pLGr5GqQBGPWzRH56P/Vd3+WOwBxaFoU6Drxc5s
 tIe23ZjkVcpA8EEG7BQBZV1Un/NX7XaCCnMniOt0RauXw+1NaslX7t/tnUAh5F1V
 TWCH4D0I0oJ0qJ7kNliGn2BP3agYXOVg81xVEUjT6KfHcYU4ImUrwi+BkeNXuXtL
 Ch5ADnbYAcUjWLFnAmEmaRtfmfNGY5T7PeGFHW2RRkaOJ88v5g14Voo6gPJaDUPn
 wMQ0nLq1xN4xZWF6ZgfRqAhArvh20k38ZujRku5vXEqnhOugQ76TF2UYiFEwOXbQ
 8jcM+yEBLGgBz7tGMwmIAml6kfxaFF1KPpdrtcPxNkGlbE6KTSuIolLx2YGUvlSU
 6/O8nwZy49ckmQ==
 =Ib7w
 -----END PGP SIGNATURE-----

Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FRED support from Thomas Gleixner:
 "Support for x86 Fast Return and Event Delivery (FRED).

  FRED is a replacement for IDT event delivery on x86 and addresses most
  of the technical nightmares which IDT exposes:

   1) Exception cause registers like CR2 need to be manually preserved
      in nested exception scenarios.

   2) Hardware interrupt stack switching is suboptimal for nested
      exceptions as the interrupt stack mechanism rewinds the stack on
      each entry which requires a massive effort in the low level entry
      of #NMI code to handle this.

   3) No hardware distinction between entry from kernel or from user
      which makes establishing kernel context more complex than it needs
      to be especially for unconditionally nestable exceptions like NMI.

   4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
      is a problem when the perf NMI takes a fault when collecting a
      stack trace.

   5) Partial restore of ESP when returning to a 16-bit segment

   6) Limitation of the vector space which can cause vector exhaustion
      on large systems.

   7) Inability to differentiate NMI sources

  FRED addresses these shortcomings by:

   1) An extended exception stack frame which the CPU uses to save
      exception cause registers. This ensures that the meta information
      for each exception is preserved on stack and avoids the extra
      complexity of preserving it in software.

   2) Hardware interrupt stack switching is non-rewinding if a nested
      exception uses the currently interrupt stack.

   3) The entry points for kernel and user context are separate and GS
      BASE handling which is required to establish kernel context for
      per CPU variable access is done in hardware.

   4) NMIs are now nesting protected. They are only reenabled on the
      return from NMI.

   5) FRED guarantees full restore of ESP

   6) FRED does not put a limitation on the vector space by design
      because it uses a central entry points for kernel and user space
      and the CPUstores the entry type (exception, trap, interrupt,
      syscall) on the entry stack along with the vector number. The
      entry code has to demultiplex this information, but this removes
      the vector space restriction.

      The first hardware implementations will still have the current
      restricted vector space because lifting this limitation requires
      further changes to the local APIC.

   7) FRED stores the vector number and meta information on stack which
      allows having more than one NMI vector in future hardware when the
      required local APIC changes are in place.

  The series implements the initial FRED support by:

   - Reworking the existing entry and IDT handling infrastructure to
     accomodate for the alternative entry mechanism.

   - Expanding the stack frame to accomodate for the extra 16 bytes FRED
     requires to store context and meta information

   - Providing FRED specific C entry points for events which have
     information pushed to the extended stack frame, e.g. #PF and #DB.

   - Providing FRED specific C entry points for #NMI and #MCE

   - Implementing the FRED specific ASM entry points and the C code to
     demultiplex the events

   - Providing detection and initialization mechanisms and the necessary
     tweaks in context switching, GS BASE handling etc.

  The FRED integration aims for maximum code reuse vs the existing IDT
  implementation to the extent possible and the deviation in hot paths
  like context switching are handled with alternatives to minimalize the
  impact. The low level entry and exit paths are seperate due to the
  extended stack frame and the hardware based GS BASE swichting and
  therefore have no impact on IDT based systems.

  It has been extensively tested on existing systems and on the FRED
  simulation and as of now there are no outstanding problems"

* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/fred: Fix init_task thread stack pointer initialization
  MAINTAINERS: Add a maintainer entry for FRED
  x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
  x86/fred: Invoke FRED initialization code to enable FRED
  x86/fred: Add FRED initialization functions
  x86/syscall: Split IDT syscall setup code into idt_syscall_init()
  KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
  x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
  x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
  x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
  x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
  x86/traps: Add sysvec_install() to install a system interrupt handler
  x86/fred: FRED entry/exit and dispatch code
  x86/fred: Add a machine check entry stub for FRED
  x86/fred: Add a NMI entry stub for FRED
  x86/fred: Add a debug fault entry stub for FRED
  x86/idtentry: Incorporate definitions/declarations of the FRED entries
  x86/fred: Make exc_page_fault() work for FRED
  x86/fred: Allow single-step trap and NMI when starting a new task
  x86/fred: No ESPFIX needed when FRED is enabled
  ...
2024-03-11 16:00:17 -07:00
Paolo Bonzini
7d8942d8e7 KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
    avoid creating ABI that KVM can't sanely support.
 
  - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
    clear that such VMs are purely a development and testing vehicle, and
    come with zero guarantees.
 
  - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
    is to support confidential VMs with deterministic private memory (SNP
    and TDX) only in the TDP MMU.
 
  - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
    when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmXZB/8ACgkQOlYIJqCj
 N/3XlQ//RIsvqr38k7kELSKhCMyWgF4J57itABrHpMqAZu3gaAo5sETX8AGcHEe5
 mxmquxyNQSf4cthhWy1kzxjGCy6+fk+Z0Z7wzfz0Yd5D+FI6vpo3HhkjovLb2gpt
 kSrHuhJyuj2vkftNvdaz0nHX1QalVyIEnXnR3oqTmxUUsg6lp1x/zr5SP0KBXjo8
 ZzJtyFd0fkRXWpA792T7XPRBWrzPV31HYZBLX8sPlYmJATcbIx9rYSThgCN6XuVN
 bfE6wATsC+mwv5BpCoDFpCKmFcqSqamag9NGe5qE5mOby5DQGYTCRMCQB8YXXBR0
 97ppaY9ZJV4nOVjrYJn6IMOSMVNfoG7nTRFfcd0eFP4tlPEgHwGr5BGDaBtQPkrd
 KcgWJw8nS02eCA2iOE+FtCXvGJwKhTTjQ45w7rU4EcfUk603L5J4GO1ddmjMhPcP
 upGGcWDK9vCGrSUFTm8pyWp/NKRJPvAQEiQd/BweSk9+isQHTX2RYCQgPAQnwlTS
 wTg7ZPNSLoUkRYmd6r+TUT32ELJGNc8GLftMnxIwweq6V7AgNMi0HE60eMovuBNO
 7DAWWzfBEZmJv+0mNNZPGXczHVv4YvMWysRdKkhztBc3+sO7P3AL1zWIDlm5qwoG
 LpFeeI3qo3o5ZNaqGzkSop2pUUGNGpWCH46WmP0AG7RpzW/Natw=
 =M0td
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD

KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09 11:48:35 -05:00
Ingo Molnar
4589f199eb Merge branch 'x86/bugs' into x86/core, to pick up pending changes before dependent patches
Merge in pending alternatives patching infrastructure changes, before
applying more patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-14 10:49:37 +01:00
Linus Torvalds
4356e9f841 work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09 15:57:48 -08:00
Paolo Bonzini
dcf0926e9b x86: replace CONFIG_HAVE_KVM with IS_ENABLED(CONFIG_KVM)
It is more accurate to check if KVM is enabled, instead of having the
architecture say so.  Architectures always "have" KVM, so for example
checking CONFIG_HAVE_KVM in x86 code is pointless, but if KVM is disabled
in a specific build, there is no need for support code.

Alternatively, many of the #ifdefs could simply be deleted.  However,
this would add completely dead code.  For example, when KVM is disabled,
there should not be any posted interrupts, i.e. NOT wiring up the "dummy"
handlers and treating IRQs on those vectors as spurious is the right
thing to do.

Cc: x86@kernel.org
Cc: kbingham@kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:45:35 -05:00
H. Peter Anvin (Intel)
cd6df3f378 x86/cpu: Add MSR numbers for FRED configuration
Add MSR numbers for the FRED configuration registers per FRED spec 5.0.

Originally-by: Megha Dey <megha.dey@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-13-xin3.li@intel.com
2024-01-31 22:01:05 +01:00
Arnaldo Carvalho de Melo
15d6daad8f tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
To pick the changes from:

  1e536e1068 ("x86/cpu: Detect TDX partial write machine check erratum")
  765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")
  30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
  04c3024560 ("x86/barrier: Do not serialize MSR accesses on AMD")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

      CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
      CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-30 10:09:18 -03:00
Brijesh Singh
b6e0f6666f x86/cpufeatures: Add SEV-SNP CPU feature
Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Since enabling the SNP CPU feature imposes a number of additional
requirements on host initialization and handling legacy firmware APIs
for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
relevant handling can be added, but leave it disabled via a
disabled-features mask.

Once all the necessary changes needed to maintain legacy SEV/SEV-ES
support are introduced in subsequent patches, the SNP feature bit will
be unmasked/enabled.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-2-michael.roth@amd.com
2024-01-29 17:13:16 +01:00
Arnaldo Carvalho de Melo
e30dca91e5 tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  a5d3df8ae1 ("KVM: remove deprecated UAPIs")
  6d72283526 ("KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT")
  89ea60c2c7 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
  8dd2eee9d5 ("KVM: x86/mmu: Handle page fault for private memory")
  a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
  5a475554db ("KVM: Introduce per-page memory attributes")
  16f95f3b95 ("KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace")
  bb58b90b1a ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2")
  3f9cd0ca84 ("KVM: arm64: Allow userspace to get the writable masks for feature ID registers")

That automatically adds support for some new ioctls and remove a bunch
of deprecated ones.

This ends up making the new binary to forget about the deprecated one,
so when used in an older system it will not be able to resolve those
codes to strings.

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2024-01-27 14:48:16.523014020 -0300
  +++ after	2024-01-27 14:48:24.183932866 -0300
  @@ -14,6 +14,7 @@
   	[0x46] = "SET_USER_MEMORY_REGION",
   	[0x47] = "SET_TSS_ADDR",
   	[0x48] = "SET_IDENTITY_MAP_ADDR",
  +	[0x49] = "SET_USER_MEMORY_REGION2",
   	[0x60] = "CREATE_IRQCHIP",
   	[0x61] = "IRQ_LINE",
   	[0x62] = "GET_IRQCHIP",
  @@ -22,14 +23,8 @@
   	[0x65] = "GET_PIT",
   	[0x66] = "SET_PIT",
   	[0x67] = "IRQ_LINE_STATUS",
  -	[0x69] = "ASSIGN_PCI_DEVICE",
   	[0x6a] = "SET_GSI_ROUTING",
  -	[0x70] = "ASSIGN_DEV_IRQ",
   	[0x71] = "REINJECT_CONTROL",
  -	[0x72] = "DEASSIGN_PCI_DEVICE",
  -	[0x73] = "ASSIGN_SET_MSIX_NR",
  -	[0x74] = "ASSIGN_SET_MSIX_ENTRY",
  -	[0x75] = "DEASSIGN_DEV_IRQ",
   	[0x76] = "IRQFD",
   	[0x77] = "CREATE_PIT2",
   	[0x78] = "SET_BOOT_CPU_ID",
  @@ -66,7 +61,6 @@
   	[0x9f] = "GET_VCPU_EVENTS",
   	[0xa0] = "SET_VCPU_EVENTS",
   	[0xa3] = "ENABLE_CAP",
  -	[0xa4] = "ASSIGN_SET_INTX_MASK",
   	[0xa5] = "SIGNAL_MSI",
   	[0xa6] = "GET_XCRS",
   	[0xa7] = "SET_XCRS",
  @@ -97,6 +91,8 @@
   	[0xcd] = "SET_SREGS2",
   	[0xce] = "GET_STATS_FD",
   	[0xd0] = "XEN_HVM_EVTCHN_SEND",
  +	[0xd2] = "SET_MEMORY_ATTRIBUTES",
  +	[0xd4] = "CREATE_GUEST_MEMFD",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

This silences these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZbVLbkngp4oq13qN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-27 15:29:47 -03:00
Arnaldo Carvalho de Melo
1743726689 tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
To pick up the changes in:

  765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")

Addressing this tools/perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2024-01-25 11:08:12.363223880 -0300
  +++ after	2024-01-25 11:08:24.839307699 -0300
  @@ -21,6 +21,7 @@
   	[0x0000004f] = "PPIN",
   	[0x00000060] = "LBR_CORE_TO",
   	[0x00000079] = "IA32_UCODE_WRITE",
  +	[0x00000087] = "IA32_MKTME_KEYID_PARTITIONING",
   	[0x0000008b] = "AMD64_PATCH_LEVEL",
   	[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
   	[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
  Using CPUID GenuineIntel-6-8E-A
  0x87
  New filter for msr:read_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
  0x87
  New filter for msr:write_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
   0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
  				   do_trace_write_msr ([kernel.kallsyms])
  				   do_trace_write_msr ([kernel.kallsyms])
  				   __switch_to_xtra ([kernel.kallsyms])
  				   __switch_to ([kernel.kallsyms])
  				   __schedule ([kernel.kallsyms])
  				   schedule ([kernel.kallsyms])
  				   futex_wait_queue_me ([kernel.kallsyms])
  				   futex_wait ([kernel.kallsyms])
  				   do_futex ([kernel.kallsyms])
  				   __x64_sys_futex ([kernel.kallsyms])
  				   do_syscall_64 ([kernel.kallsyms])
  				   entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
  				   __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
   0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
  				   do_trace_write_msr ([kernel.kallsyms])
  				   do_trace_write_msr ([kernel.kallsyms])
  				   __switch_to_xtra ([kernel.kallsyms])
  				   __switch_to ([kernel.kallsyms])
  				   __schedule ([kernel.kallsyms])
  				   schedule_idle ([kernel.kallsyms])
  				   do_idle ([kernel.kallsyms])
  				   cpu_startup_entry ([kernel.kallsyms])
  				   secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJt27rjkQVU1YoP@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-26 10:51:48 -03:00
H. Peter Anvin (Intel)
e554a8ca49 x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled
Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
cpu_feature_enabled() work correctly with FRED.

Originally-by: Megha Dey <megha.dey@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-8-xin3.li@intel.com
2024-01-25 19:10:30 +01:00
H. Peter Anvin (Intel)
51c158f7aa x86/cpufeatures: Add the CPU feature bit for FRED
Any FRED enabled CPU will always have the following features as its
baseline:

  1) LKGS, load attributes of the GS segment but the base address into
     the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
     cache.

  2) WRMSRNS, non-serializing WRMSR for faster MSR writes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-7-xin3.li@intel.com
2024-01-25 19:10:30 +01:00
Xin Li
a4cb5ece14 x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support
WRMSRNS is an instruction that behaves exactly like WRMSR, with
the only difference being that it is not a serializing instruction
by default. Under certain conditions, WRMSRNS may replace WRMSR to
improve performance.

Add its CPU feature bit, opcode to the x86 opcode map, and an
always inline API __wrmsrns() to embed WRMSRNS into the code.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231205105030.8698-2-xin3.li@intel.com
2024-01-25 19:10:29 +01:00
Breno Leitao
0911b8c52c x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-11-leitao@debian.org
2024-01-10 10:52:29 +01:00
Breno Leitao
ac61d43983 x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
Step 7/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-8-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
aefb2f2e61 x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted a few more uses in comments/messages as well. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
ea4654e088 x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
5fa31af31e x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
Step 3/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org
2024-01-10 10:52:28 +01:00
Linus Torvalds
bef91c28f2 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
and use them everywhere instead of ad-hoc family/model checks. Drop an
   ancient AMD errata checking facility as a result
 
 - Fix a fragile initcall ordering in intel_epb
 
 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and X2APIC
   MSRs on AMD as it is not needed there
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWalaEACgkQEsHwGGHe
 VUrDYg/+LjqjJv3OcVZZkx9WVds0kmBCajrf9JxRYgiSTIpiL/usH0QOms8FjHQ6
 tYcukj+RJDk2nP5ho3Vs1WNA0mvU0nxC+99u0Ph4zugSMSl0XGOA+YxxTBPXmDGB
 1IxH9IloMFPhwDoQ4/ear0IvjIrfE4ESV2Dafe45WzVSdG7/0ijurisaH1kPYraP
 wzuNn142Tk0eicaam30sdThXZraO9Paz5MOYbpYEAU4lxNtdH85sQa+Xk0tqJcjD
 IwEcQJLE6n3r8t/lNMIlhAsmOVGrD5WltDH9HvEmKT4mzTumSc9DLu3YHHRWyx2K
 TMpRYHlVuvGJkJV3CYXi8fhTsV6uMsHEe1+xZ/Rf0iQzOG25v+zen8WK4REWOr/o
 VmprG3j7LkEFeeH3CqSOtVSbYmxFILQb6pAbzSlI907b5C6PaEYuudjVXuX01urN
 IG3krWHGMJ3AWKDV2Z3hW1TYtbLJyqKPNhqcBJiOuWyCe8cQXfKQBTpP5HuAEZEd
 UXc4QpStMvuPqxyQhlPSTAtY7L/UVhBH8oHoXPYiBmcCo7VtJYW6HH9z1ISUc1av
 FgKdkpx6vaJiXlD/wI/B5T1oViWQ8udhHpit99rhKl623e7WC2rdguAOVDLn/YIe
 cZB+R05yknBWOavH0kcuz9R9xYKMSBcEsRnBKmeOg9R+tTK/7BM=
 =afTN
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu feature updates from Borislav Petkov:

 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
   and use them everywhere instead of ad-hoc family/model checks. Drop
   an ancient AMD errata checking facility as a result

 - Fix a fragile initcall ordering in intel_epb

 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
   X2APIC MSRs on AMD as it is not needed there

* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add X86_FEATURE_ZEN1
  x86/CPU/AMD: Drop now unused CPU erratum checking function
  x86/CPU/AMD: Get rid of amd_erratum_1485[]
  x86/CPU/AMD: Get rid of amd_erratum_400[]
  x86/CPU/AMD: Get rid of amd_erratum_383[]
  x86/CPU/AMD: Get rid of amd_erratum_1054[]
  x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
  x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
  x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
  x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
  x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
  x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
  x86/CPU/AMD: Carve out the erratum 1386 fix
  x86/CPU/AMD: Add ZenX generations flags
  x86/cpu/intel_epb: Don't rely on link order
  x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08 15:45:31 -08:00
Borislav Petkov (AMD)
232afb5578 x86/CPU/AMD: Add X86_FEATURE_ZEN1
Add a synthetic feature flag specifically for first generation Zen
machines. There's need to have a generic flag for all Zen generations so
make X86_FEATURE_ZEN be that flag.

Fixes: 30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com
2023-12-12 11:17:37 +01:00
Namhyung Kim
c23708f376 tools headers: Update tools's copy of x86/asm headers
tldr; Just FYI, I'm carrying this on the perf tools tree.

Full explanation:

There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.

The way these headers are used in perf are not restricted to just
including them to compile something.

There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.

E.g.:

  $ ls -1 tools/perf/trace/beauty/*.sh | head -5
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/fsmount.sh
  $
  $ tools/perf/trace/beauty/fadvise.sh
  static const char *fadvise_advices[] = {
        [0] = "NORMAL",
        [1] = "RANDOM",
        [2] = "SEQUENTIAL",
        [3] = "WILLNEED",
        [4] = "DONTNEED",
        [5] = "NOREUSE",
  };
  $

The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231121225650.390246-8-namhyung@kernel.org
2023-11-22 10:57:47 -08:00
Linus Torvalds
189b756271 seccomp fix for v6.6-rc7
- Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmUwfdQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJixtD/98vbEvwKMPxQxTH17Y2XMA7uSg
 8/WV6reVxE+DCalieqO14XRP/I++88W5A+HJSxZ+C9Q1WFXdsGINr3XMcihWZIc1
 IkzJ7t+LTTn4Y8MxC0m7LyIqpRIVYTkKW6RwrUbsZj0+vrpXtSWz6Gy7Oz7YXvah
 Y3DHVyFpi8FNK5cEz0cmBIITE0CCR/eQuiMD1057esudruWCFiKKGCZN6hPdNTlE
 pzCYGKNlA3eXOzsW/N2qAIj2nlnEJCMpP92vnvpRhcNIa+MKCzAELSi73QScG/ja
 lrcvCWWpWmR5sSNhA2dWbamR/S6GrEtwo16J4Sg/sYqHrywsr0vVGjEHLzkpf8Ef
 zxW5mGXl78dL+zoq6woyOzt/ztQ1kqVbSXF0VMMhVjUQhTSkvivr6LX+mwvNJYim
 KLAaJmWgPj6wm9agpQ/hukNKaENiA6sDTb+wpcWKdrWl958DoSY8zl3ZdJXWSxnE
 EWXLRtPioMP9BR/9U63Fr2U/m7dXKXHRfn5J2+0WBdq4MsuPtwkIs6SAaec0AKMS
 jJnRcu/b1ub00WitnJZk8v4jFfHVernFVkUGhFrgdXzwF7hXaMvsAPK6wfg2MNpe
 9aRPS4Pf8xgF6Bz7XqNrw3qgNmQwZ5kn/ucuwSjfOc2BcwSKjYGVqwXUQ8hbQdNK
 HtrskXY14BxtSLo1Gg==
 =Ir3f
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19 10:10:14 -07:00
Jiri Slaby (SUSE)
31c65705a8 perf/benchmark: fix seccomp_unotify benchmark for 32-bit
Commit 7d5cb68af6 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68af6 (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18 17:47:18 -07:00
Arnaldo Carvalho de Melo
15ca35494e tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  1b5277c0ea ("x86/srso: Add SRSO_NO support")
  8974eb5882 ("x86/speculation: Add Gather Data Sampling mitigation")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZQGismCqcDddjEIQ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13 08:53:37 -03:00
Linus Torvalds
0c02183427 ARM:
* Clean up vCPU targets, always returning generic v8 as the preferred target
 
 * Trap forwarding infrastructure for nested virtualization (used for traps
   that are taken from an L2 guest and are needed by the L1 hypervisor)
 
 * FEAT_TLBIRANGE support to only invalidate specific ranges of addresses
   when collapsing a table PTE to a block PTE.  This avoids that the guest
   refills the TLBs again for addresses that aren't covered by the table PTE.
 
 * Fix vPMU issues related to handling of PMUver.
 
 * Don't unnecessary align non-stack allocations in the EL2 VA space
 
 * Drop HCR_VIRT_EXCP_MASK, which was never used...
 
 * Don't use smp_processor_id() in kvm_arch_vcpu_load(),
   but the cpu parameter instead
 
 * Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
 
 * Remove prototypes without implementations
 
 RISC-V:
 
 * Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
 
 * Added ONE_REG interface for SATP mode
 
 * Added ONE_REG interface to enable/disable multiple ISA extensions
 
 * Improved error codes returned by ONE_REG interfaces
 
 * Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
 
 * Added get-reg-list selftest for KVM RISC-V
 
 s390:
 
 * PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
   Allows a PV guest to use crypto cards. Card access is governed by
   the firmware and once a crypto queue is "bound" to a PV VM every
   other entity (PV or not) looses access until it is not bound
   anymore. Enablement is done via flags when creating the PV VM.
 
 * Guest debug fixes (Ilya)
 
 x86:
 
 * Clean up KVM's handling of Intel architectural events
 
 * Intel bugfixes
 
 * Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug
   registers and generate/handle #DBs
 
 * Clean up LBR virtualization code
 
 * Fix a bug where KVM fails to set the target pCPU during an IRTE update
 
 * Fix fatal bugs in SEV-ES intrahost migration
 
 * Fix a bug where the recent (architecturally correct) change to reinject
   #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
 
 * Retry APIC map recalculation if a vCPU is added/enabled
 
 * Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
   "emergency disabling" behavior to KVM actually being loaded, and move all of
   the logic within KVM
 
 * Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
   ratio MSR cannot diverge from the default when TSC scaling is disabled
   up related code
 
 * Add a framework to allow "caching" feature flags so that KVM can check if
   the guest can use a feature without needing to search guest CPUID
 
 * Rip out the ancient MMU_DEBUG crud and replace the useful bits with
   CONFIG_KVM_PROVE_MMU
 
 * Fix KVM's handling of !visible guest roots to avoid premature triple fault
   injection
 
 * Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface
   that is needed by external users (currently only KVMGT), and fix a variety
   of issues in the process
 
 This last item had a silly one-character bug in the topic branch that
 was sent to me.  Because it caused pretty bad selftest failures in
 some configurations, I decided to squash in the fix.  So, while the
 exact commit ids haven't been in linux-next, the code has (from the
 kvm-x86 tree).
 
 Generic:
 
 * Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
   action specific data without needing to constantly update the main handlers.
 
 * Drop unused function declarations
 
 Selftests:
 
 * Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
 
 * Add support for printf() in guest code and covert all guest asserts to use
   printf-based reporting
 
 * Clean up the PMU event filter test and add new testcases
 
 * Include x86 selftests in the KVM x86 MAINTAINERS entry
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT1m0kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMNgggAiN7nz6UC423FznuI+yO3TLm8tkx1
 CpKh5onqQogVtchH+vrngi97cfOzZb1/AtifY90OWQi31KEWhehkeofcx7G6ERhj
 5a9NFADY1xGBsX4exca/VHDxhnzsbDWaWYPXw5vWFWI6erft9Mvy3tp1LwTvOzqM
 v8X4aWz+g5bmo/DWJf4Wu32tEU6mnxzkrjKU14JmyqQTBawVmJ3RYvHVJ/Agpw+n
 hRtPAy7FU6XTdkmq/uCT+KUHuJEIK0E/l1js47HFAqSzwdW70UDg14GGo1o4ETxu
 RjZQmVNvL57yVgi6QU38/A0FWIsWQm5IlaX1Ug6x8pjZPnUKNbo9BY4T1g==
 =W+4p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Clean up vCPU targets, always returning generic v8 as the preferred
     target

   - Trap forwarding infrastructure for nested virtualization (used for
     traps that are taken from an L2 guest and are needed by the L1
     hypervisor)

   - FEAT_TLBIRANGE support to only invalidate specific ranges of
     addresses when collapsing a table PTE to a block PTE. This avoids
     that the guest refills the TLBs again for addresses that aren't
     covered by the table PTE.

   - Fix vPMU issues related to handling of PMUver.

   - Don't unnecessary align non-stack allocations in the EL2 VA space

   - Drop HCR_VIRT_EXCP_MASK, which was never used...

   - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
     parameter instead

   - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()

   - Remove prototypes without implementations

  RISC-V:

   - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest

   - Added ONE_REG interface for SATP mode

   - Added ONE_REG interface to enable/disable multiple ISA extensions

   - Improved error codes returned by ONE_REG interfaces

   - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V

   - Added get-reg-list selftest for KVM RISC-V

  s390:

   - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)

     Allows a PV guest to use crypto cards. Card access is governed by
     the firmware and once a crypto queue is "bound" to a PV VM every
     other entity (PV or not) looses access until it is not bound
     anymore. Enablement is done via flags when creating the PV VM.

   - Guest debug fixes (Ilya)

  x86:

   - Clean up KVM's handling of Intel architectural events

   - Intel bugfixes

   - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
     debug registers and generate/handle #DBs

   - Clean up LBR virtualization code

   - Fix a bug where KVM fails to set the target pCPU during an IRTE
     update

   - Fix fatal bugs in SEV-ES intrahost migration

   - Fix a bug where the recent (architecturally correct) change to
     reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
     skip it)

   - Retry APIC map recalculation if a vCPU is added/enabled

   - Overhaul emergency reboot code to bring SVM up to par with VMX, tie
     the "emergency disabling" behavior to KVM actually being loaded,
     and move all of the logic within KVM

   - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
     TSC ratio MSR cannot diverge from the default when TSC scaling is
     disabled up related code

   - Add a framework to allow "caching" feature flags so that KVM can
     check if the guest can use a feature without needing to search
     guest CPUID

   - Rip out the ancient MMU_DEBUG crud and replace the useful bits with
     CONFIG_KVM_PROVE_MMU

   - Fix KVM's handling of !visible guest roots to avoid premature
     triple fault injection

   - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
     API surface that is needed by external users (currently only
     KVMGT), and fix a variety of issues in the process

  Generic:

   - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
     events to pass action specific data without needing to constantly
     update the main handlers.

   - Drop unused function declarations

  Selftests:

   - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs

   - Add support for printf() in guest code and covert all guest asserts
     to use printf-based reporting

   - Clean up the PMU event filter test and add new testcases

   - Include x86 selftests in the KVM x86 MAINTAINERS entry"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
  KVM: x86/mmu: Include mmu.h in spte.h
  KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
  KVM: x86/mmu: Disallow guest from using !visible slots for page tables
  KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
  KVM: x86/mmu: Harden new PGD against roots without shadow pages
  KVM: x86/mmu: Add helper to convert root hpa to shadow page
  drm/i915/gvt: Drop final dependencies on KVM internal details
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86: Remove the unused page-track hook track_flush_slot()
  drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  ...
2023-09-07 13:52:20 -07:00
Paolo Bonzini
bd7fe98b35 KVM: x86: SVM changes for 6.6:
- Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug
    registers and generate/handle #DBs
 
  - Clean up LBR virtualization code
 
  - Fix a bug where KVM fails to set the target pCPU during an IRTE update
 
  - Fix fatal bugs in SEV-ES intrahost migration
 
  - Fix a bug where the recent (architecturally correct) change to reinject
    #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmTue8YSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5aqUP/jF7DyMXyQGYMKoQhFxWyGRhfqV8Ov8i
 7sUpEKSx5WTxOsFHBgdGeNU+m9eBJHWVmrJM9imI4OCUvJmxasRRsnyhvEUvBIUE
 amQT45aVm2xqjRNRUkOCUUHiDKtUdwpSRlOSyhnDTKmlMbNoH+fO3SLJ1oB/fsae
 wnmyiF98j2vT/5mD6F/F87hlNMq4CqG/OZWJ9Kk8GfvfJpUcC8r/H0NsMgSMF2/L
 Q+Hn+r/XDfMSrBiyEzevWyPbJi7nL+WF9EQDJASf+aAkmFMHK6AU4XNITwVw3XcZ
 FGtSP/NzvnePhd5gqtbiW9hRQkWcKjqnydtyI3ZDVVBpEbJ6OJn3+UFoLZ8NoSE+
 D3EDs1PA7Qjty6kYx9/NZpXz5BAMd9mikkTL7PTrlrAZAEimToqoHx7mBjmLp4E+
 diKrpG2w1OTtO/Pafi0z0zZN6Yc9MJOyZVK78DpIiLey3rNip9SawWGh+wV14WNC
 nbn7Wpf8EGE1E8n00mwrGMRCuRm7LQhLbcVXITiGKrbpxUzam6sqDIgt73Q7xma2
 NWcPizeFNy47uurNOA2V9xHkbEAYjWaM12uyzmGzILvvmvNnpU0NuZ78cgV5ZWMk
 4US53CAQbG4+qUCJWhIDoriluaLXjL9tLiZgJW0T6cus3nL5NWYqvlq6TWYyK00J
 zjiK7vky77Pq
 =WC5V
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.6' of https://github.com/kvm-x86/linux into HEAD

KVM: x86: SVM changes for 6.6:

 - Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug
   registers and generate/handle #DBs

 - Clean up LBR virtualization code

 - Fix a bug where KVM fails to set the target pCPU during an IRTE update

 - Fix fatal bugs in SEV-ES intrahost migration

 - Fix a bug where the recent (architecturally correct) change to reinject
   #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
2023-08-31 13:32:40 -04:00
Linus Torvalds
1687d8aca5 * Rework apic callbacks, getting rid of unnecessary ones and
coalescing lots of silly duplicates.
  * Use static_calls() instead of indirect calls for apic->foo()
  * Tons of cleanups an crap removal along the way
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTvfO8ACgkQaDWVMHDJ
 krAP2A//ccii/LuvtTnNEIMMR5w2rwTdHv91ancgFkC8pOeNk37Z8sSLq8tKuLFA
 vgjBIysVIqunuRcNCJ+eqwIIxYfU+UGCWHppzLwO+DY3Q7o9EoTL0BgytdAqxpQQ
 ntEVarqWq25QYXKFoAqbUTJ1UXa42/8HfiXAX/jvP+ACXfilkGPZre6ASxlXeOhm
 XbgPuNQPmXi2WYQH9GCQEsz2Nh80hKap8upK2WbQzzJ3lXsm+xA//4klab0HCYwl
 Uc302uVZozyXRMKbAlwmgasTFOLiV8KKriJ0oHoktBpWgkpdR9uv/RDeSaFR3DAl
 aFmecD4k/Hqezg4yVl+4YpEn2KjxiwARCm4PMW5AV7lpWBPBHAOOai65yJlAi9U6
 bP8pM0+aIx9xg7oWfsTnQ7RkIJ+GZ0w+KZ9LXFM59iu3eV1pAJE3UVyUehe/J1q9
 n8OcH0UeHRlAb8HckqVm1AC7IPvfHw4OAPtUq7z3NFDwbq6i651Tu7f+i2bj31cX
 77Ames+fx6WjxUjyFbJwaK44E7Qez3waztdBfn91qw+m0b+gnKE3ieDNpJTqmm5b
 mKulV7KJwwS6cdqY3+Kr+pIlN+uuGAv7wGzVLcaEAXucDsVn/YAMJHY2+v97xv+n
 J9N+yeaYtmSXVlDsJ6dndMrTQMmcasK1CVXKxs+VYq5Lgf+A68w=
 =eoKm
 -----END PGP SIGNATURE-----

Merge tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 apic updates from Dave Hansen:
 "This includes a very thorough rework of the 'struct apic' handlers.
  Quite a variety of them popped up over the years, especially in the
  32-bit days when odd apics were much more in vogue.

  The end result speaks for itself, which is a removal of a ton of code
  and static calls to replace indirect calls.

  If there's any breakage here, it's likely to be around the 32-bit
  museum pieces that get light to no testing these days.

  Summary:

   - Rework apic callbacks, getting rid of unnecessary ones and
     coalescing lots of silly duplicates.

   - Use static_calls() instead of indirect calls for apic->foo()

   - Tons of cleanups an crap removal along the way"

* tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/apic: Turn on static calls
  x86/apic: Provide static call infrastructure for APIC callbacks
  x86/apic: Wrap IPI calls into helper functions
  x86/apic: Mark all hotpath APIC callback wrappers __always_inline
  x86/xen/apic: Mark apic __ro_after_init
  x86/apic: Convert other overrides to apic_update_callback()
  x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()
  x86/apic: Provide apic_update_callback()
  x86/xen/apic: Use standard apic driver mechanism for Xen PV
  x86/apic: Provide common init infrastructure
  x86/apic: Wrap apic->native_eoi() into a helper
  x86/apic: Nuke ack_APIC_irq()
  x86/apic: Remove pointless arguments from [native_]eoi_write()
  x86/apic/noop: Tidy up the code
  x86/apic: Remove pointless NULL initializations
  x86/apic: Sanitize APIC ID range validation
  x86/apic: Prepare x2APIC for using apic::max_apic_id
  x86/apic: Simplify X2APIC ID validation
  x86/apic: Add max_apic_id member
  x86/apic: Wrap APIC ID validation into an inline
  ...
2023-08-30 10:44:46 -07:00
Linus Torvalds
b03a434214 seccomp updates for v6.6-rc1
- Provide USER_NOTIFY flag for synchronous mode (Andrei Vagin, Peter
   Oskolkov). This touches the scheduler and perf but has been Acked by
   Peter Zijlstra.
 
 - Fix regression in syscall skipping and restart tracing on arm32.
   This touches arch/arm/ but has been Acked by Arnd Bergmann.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmTs418WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJohpD/4tEfRdnb/KDgwQ7uvqBonUJXcx
 wqw17LZCGTpBV3/Tp3+aEseD1NezOxiMJL88VyUHSy7nfDJShbL6QtyoenwEOeXJ
 HmBUfcIH3cqRutHEJ3drYBzBetpeeK2G+gTYVj+JoEfPWyPf+Egj+1JE2n1xLi92
 WC1miBAyBZ59kN+D1hcDzJu24CkAwbcUYlEzGejN5lBOwxYV3/fjARBVRvefOO5m
 jljSCIVJOFgCiybKhJ7Zw1+lkFc3cIlcOgr4/ZegSc8PxFVebnuImTHHp/gvoo6F
 7d1xe5Hk+PSfNvVq41MAeRB2vK2tY5efwjXRarThUaydPTO43KiQm0dzP0EYWK9a
 LcOg8zAXZnpvuWU5O2SqUKADcxe2TjS1WuQ/Q4ixxgKz2kJKDwrNU8Frf327eLSR
 acfZgMMiUfEXyXDV9B3LzNAtwdvwyxYrzEzxgKywhThIhZmQDat0rI2IaTV5QIc5
 pkxiFEe0TPwpzyUVO9dSzE+ughTmNQOKk5uAM9e2NwRwVdhEmlZAxo0kStJ1NoaA
 yDjYIKfaNBElchL4v2931KJFJseI+uRaWdW10JEV+1M69+gEAEs6wbmAxtcYS776
 xWsYp3slXzlmeVyvQp/ah8p0y55r+qTbcnhkvIdiwLYei4Bh3KOoJUlVmW0V5dKq
 b+7qspIvBA0kKRAqPw==
 =DI8R
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:

 - Provide USER_NOTIFY flag for synchronous mode (Andrei Vagin, Peter
   Oskolkov). This touches the scheduler and perf but has been Acked by
   Peter Zijlstra.

 - Fix regression in syscall skipping and restart tracing on arm32. This
   touches arch/arm/ but has been Acked by Arnd Bergmann.

* tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Add missing kerndoc notations
  ARM: ptrace: Restore syscall skipping for tracers
  ARM: ptrace: Restore syscall restart tracing
  selftests/seccomp: Handle arm32 corner cases better
  perf/benchmark: add a new benchmark for seccom_unotify
  selftest/seccomp: add a new test for the sync mode of seccomp_user_notify
  seccomp: add the synchronous mode for seccomp_unotify
  sched: add a few helpers to wake up tasks on the current cpu
  sched: add WF_CURRENT_CPU and externise ttwu
  seccomp: don't use semaphore and wait_queue together
2023-08-28 12:38:26 -07:00
Linus Torvalds
b4f63b0f2d perf tools fixes for v6.5: 3rd batch
- Revert a patch that unconditionally resolved addresses to inlines in
   callchains, something that was done before when DWARF mode was asked
   for, but could as well be done when just frame pointers (the default)
   was selected. This enriches the callchains with inlines but the way to
   resolve it is gross right now, relying on addr2line, and even if we come
   up with an efficient way of processing all the associated DWARF info for
   a big file as vmlinux is, this has to be something people opt-in, as it
   will still result in overheads, so revert it until we get this done in a
   saner way.
 
 - Update the x86 msr-index.h header with the kernel original, no change
   in tooling output, just addresses a tools/perf build warning.
 
 - Resolve a regression where special "tool events", such as
   "duration_time" were being presented for all CPUs, when it only makes
   sense to show it for the workload, that is, just once.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZNP/OAAKCRCyPKLppCJ+
 J7cGAQDgNpsAqGk+/Xkk7lPcp8aJ7q+5oaxv8iaGhdblq7V52gD+L2t8sNPQYWE3
 sy2QQ+9tsZiONfpdxknsduxoyfE+Vgs=
 =CRYB
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.5-3-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Revert a patch that unconditionally resolved addresses to inlines in
   callchains, something that was done before when DWARF mode was asked
   for, but could as well be done when just frame pointers (the default)
   was selected.

   This enriches the callchains with inlines but the way to resolve it
   is gross right now, relying on addr2line, and even if we come up with
   an efficient way of processing all the associated DWARF info for a
   big file as vmlinux is, this has to be something people opt-in, as it
   will still result in overheads, so revert it until we get this done
   in a saner way.

 - Update the x86 msr-index.h header with the kernel original, no change
   in tooling output, just addresses a tools/perf build warning.

  - Resolve a regression where special "tool events", such as
    "duration_time" were being presented for all CPUs, when it only
    makes sense to show it for the workload, that is, just once.

* tag 'perf-tools-fixes-for-v6.5-3-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf stat: Don't display zero tool counts
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  Revert "perf report: Append inlines to non-DWARF callchains"
2023-08-09 21:06:14 -07:00
Arnaldo Carvalho de Melo
8cdd4aeff2 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZND17H7BI4ariERn@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-08 10:52:32 -03:00
Xin Li
6e3edb0fb5 tools: Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools
IRQ_MOVE_CLEANUP_VECTOR is not longer in use. Remove the last traces.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230621171248.6805-4-xin3.li@intel.com
2023-08-06 14:15:10 +02:00
Alexey Kardashevskiy
d1f85fbe83 KVM: SEV: Enable data breakpoints in SEV-ES
Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load most _guest_ debug
registers (except DR6/7), and KVM cannot manually context switch guest
DRs due the VMSA being encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB.  Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialised but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 intercepts as KVM can't modify guest DR7, and intercepting
DR7 would completely defeat the purpose of enabling DebugSwap.

Make X86_FEATURE_DEBUG_SWAP appear in /proc/cpuinfo (by not adding "") to
let the operator know if the VM can debug.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-7-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:12:56 -07:00
Borislav Petkov (AMD)
0e52740ffd x86/bugs: Increase the x86 bugs vector size to two u32s
There was never a doubt in my mind that they would not fit into a single
u32 eventually.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-18 09:35:38 +02:00
Andrei Vagin
7d5cb68af6 perf/benchmark: add a new benchmark for seccom_unotify
The benchmark is similar to the pipe benchmark. It creates two processes,
one is calling syscalls, and another process is handling them via seccomp
user notifications. It measures the time required to run a specified number
of interations.

 $ ./perf bench sched  seccomp-notify --sync-mode --loop 1000000
 # Running 'sched/seccomp-notify' benchmark:
 # Executed 1000000 system calls

     Total time: 2.769 [sec]

       2.769629 usecs/op
         361059 ops/sec

 $ ./perf bench sched  seccomp-notify
 # Running 'sched/seccomp-notify' benchmark:
 # Executed 1000000 system calls

     Total time: 8.571 [sec]

       8.571119 usecs/op
         116670 ops/sec

Signed-off-by: Andrei Vagin <avagin@google.com>
Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230308073201.3102738-7-avagin@google.com
Link: https://lore.kernel.org/r/20230630051953.454638-1-avagin@gmail.com
[kees: Added PRIu64 format string]
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-07-17 16:08:08 -07:00
Linus Torvalds
8c69e7afe9 - Up until now the Fast Short Rep Mov optimizations implied the presence
of the ERMS CPUID flag. AMD decoupled them with a BIOS setting so decouple
   that dependency in the kernel code too
 
 - Teach the alternatives machinery to handle relocations
 
 - Make debug_alternative accept flags in order to see only that set of
   patching done one is interested in
 
 - Other fixes, cleanups and optimizations to the patching code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZi2AACgkQEsHwGGHe
 VUqhGw/9EC/m5HTFBlCy9PS5Qy6pPLzmHR5Tuy4meqlnB1gN+5wzfxdYEwHm46hH
 SR6WqR12yVaCMIzh66y8nTJyMbIykaBbfFJb3WesdDrBIYUZ9f+7O+Xd0JS6Jykd
 2HBHOyaVS1/W75+y6w9JhTExBH5xieCpJVIYyAvifbn/pB8XmuTTwJ1Z3EJ8DzkK
 AN16i46bUiKNBdTYZUMhtKL4vHVfqLYMskgWe6IG7DmRLOwikR0uRVhuVqP/bmUj
 U128cUacGJT2AYbZarTAKmOa42nDj3TpJqRp1qit3y6Cun4vxKH+1A91UPd7IHTa
 M5H1bNSgfXMm8rU+JgfvXKqrCTckGn2OqlCkJfPV3RBeP9IcQBBF0vE3dnM/X2We
 dwbXeDfJvc+1s4/M41MOhyahTUbW+4iRK5UCZEt1mprTbtzHTlN7RROo7QLpFsWx
 T0Jqvsd1raAutPTgTjU7ToQwDpSQNnn4Y/KoEdpvOCXR8wU7Wo5/+Qa4tEkIY3W6
 mUFpJcgFC9QEKLuaNAofPIhMuZ/vzRVtpK7wbLn4KR5JZA8AxznenMFVg8YPWRFI
 4oga0kMFJ7t6z/CXHtrxFaLQ9e7WAUSRU6gPiz8As1F/K9N0JWMUfjuTJcgjUsF8
 bwdCNinwG8y3rrPUCrqbO5N766ZkLYd6NksKlmIyUvtCcS0ksbg=
 =mH38
 -----END PGP SIGNATURE-----

Merge tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 instruction alternatives updates from Borislav Petkov:

 - Up until now the Fast Short Rep Mov optimizations implied the
   presence of the ERMS CPUID flag. AMD decoupled them with a BIOS
   setting so decouple that dependency in the kernel code too

 - Teach the alternatives machinery to handle relocations

 - Make debug_alternative accept flags in order to see only that set of
   patching done one is interested in

 - Other fixes, cleanups and optimizations to the patching code

* tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: PAUSE is not a NOP
  x86/alternatives: Add cond_resched() to text_poke_bp_batch()
  x86/nospec: Shorten RESET_CALL_DEPTH
  x86/alternatives: Add longer 64-bit NOPs
  x86/alternatives: Fix section mismatch warnings
  x86/alternative: Optimize returns patching
  x86/alternative: Complicate optimize_nops() some more
  x86/alternative: Rewrite optimize_nops() some
  x86/lib/memmove: Decouple ERMS from FSRM
  x86/alternative: Support relocations in alternatives
  x86/alternative: Make debug-alternative selective
2023-06-26 15:14:55 -07:00
Peter Zijlstra
df25edbac3 x86/alternatives: Add longer 64-bit NOPs
By adding support for longer NOPs there are a few more alternatives
that can turn into a single instruction.

Add up to NOP11, the same limit where GNU as .nops also stops
generating longer nops. This is because a number of uarchs have severe
decode penalties for more than 3 prefixes.

  [ bp: Sync up with the version in tools/ while at it. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515093020.661756940@infradead.org
2023-05-31 10:21:21 +02:00
Tiezhu Yang
4e111f0cf0 perf bench syscall: Fix __NR_execve undeclared build error
The __NR_execve definition for i386 was deleted by mistake
in the commit ece7f7c050 ("perf bench syscall: Add fork
syscall benchmark"), add it to fix the build error on i386.

Fixes: ece7f7c050 ("perf bench syscall: Add fork syscall benchmark")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: loongson-kernel@lists.loongnix.cn
Closes: https://lore.kernel.org/all/CA+G9fYvgBR1iB0CorM8OC4AM_w_tFzyQKHc+rF6qPzJL=TbfDQ@mail.gmail.com/
Link: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19 12:08:00 -03:00
Arnaldo Carvalho de Melo
1b5f159ce8 tools headers disabled-features: Sync with the kernel sources
To pick the changes from:

  e0bddc19ba ("x86/mm: Reduce untagged_addr() overhead for systems without LAM")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTpdlzrlRjjnY6K@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 11:49:47 -03:00
Arnaldo Carvalho de Melo
29719e3198 tools headers UAPI: Sync arch prctl headers with the kernel sources
To pick the changes in this cset:

  a03c376eba ("x86/arch_prctl: Add AMX feature numbers as ABI constants")
  23e5d9ec2b ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
  2f8794bd08 ("x86/mm: Provide arch_prctl() interface for LAM")

This picks these new prctls in a third range, that was also added to the
tools/perf/trace/beauty/arch_prctl.c beautifier.

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  @@ -20,3 +20,11 @@
   	[0x2003 - 0x2001]= "MAP_VDSO_64",
   };

  +#define x86_arch_prctl_codes_3_offset 0x4001
  +static const char *x86_arch_prctl_codes_3[] = {
  +	[0x4001 - 0x4001]= "GET_UNTAG_MASK",
  +	[0x4002 - 0x4001]= "ENABLE_TAGGED_ADDR",
  +	[0x4003 - 0x4001]= "GET_MAX_TAG_BITS",
  +	[0x4004 - 0x4001]= "FORCE_TAGGED_SVA",
  +};
  +
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTjNPpD3FOWfetM@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 11:23:43 -03:00
Arnaldo Carvalho de Melo
9bc83d6e38 tools headers x86 cpufeatures: Sync with the kernel sources
To pick the changes from:

  3d8f61bf8b ("x86: KVM: Add common feature flag for AMD's PSFD")
  3763bf5802 ("x86/cpufeatures: Redefine synthetic virtual NMI bit as AMD's "real" vNMI")
  6449dcb0ca ("x86: CPUID and CR3/CR4 flags for Linear Address Masking")
  be8de49bea ("x86/speculation: Identify processors vulnerable to SMT RSB predictions")
  e7862eda30 ("x86/cpu: Support AMD Automatic IBRS")
  faabfcb194 ("x86/cpu, kvm: Add the SMM_CTL MSR not present feature")
  5b909d4ae5 ("x86/cpu, kvm: Add the Null Selector Clears Base feature")
  84168ae786 ("x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf")
  a9dc9ec5a1 ("x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature")
  f8df91e73a ("x86/cpufeatures: Add macros for Intel's new fast rep string features")
  78335aac61 ("x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag")
  f334f723a6 ("x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag")
  a018d2e3d4 ("x86/cpufeatures: Add Architectural PerfMon Extension bit")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/lkml/ZGTTw642q8mWgv2Y@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 10:39:40 -03:00
Yanteng Si
34e82891d9 tools arch x86: Sync the msr-index.h copy with the kernel sources
Picking the changes from:

  c68e3d4739 ("x86/include/asm/msr-index.h: Add IFS Array test bits")

Silencing these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/05778ab3c168c8030f6b20e60375dc803f0cd300.1683712945.git.siyanteng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-10 14:19:20 -03:00
Yanteng Si
705049ca4f tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
Picking the changes from:

  e65733b5c5 ("KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL")
  30ec7997d1 ("KVM: arm64: timers: Allow userspace to set the global counter offset")
  821d935c87 ("KVM: arm64: Introduce support for userspace SMCCC filtering")
  81dc9504a7 ("KVM: arm64: nv: timers: Support hyp timer emulation")
  a8308b3fc9 ("KVM: arm64: Refactor hvc filtering to support different actions")
  0e5c9a9d65 ("KVM: arm64: Expose SMC/HVC width to userspace")

Silencing these perf build warnings:

 Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
 diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
 Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
 diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
 Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
 diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/ac5adb58411d23b3360d436a65038fefe91c32a8.1683712945.git.siyanteng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-10 14:19:20 -03:00
Linus Torvalds
f085df1be6 Disable building BPF based features by default for v6.4.
We need to better polish building with BPF skels, so revert back to
 making it an experimental feature that has to be explicitely enabled
 using BUILD_BPF_SKEL=1.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
 J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
 DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
 =7G78
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Third version of perf tool updates, with the build problems with with
  using a 'vmlinux.h' generated from the main build fixed, and the bpf
  skeleton build disabled by default.

  Build:

   - Require libtraceevent to build, one can disable it using
     NO_LIBTRACEEVENT=1.

     It is required for tools like 'perf sched', 'perf kvm', 'perf
     trace', etc.

     libtraceevent is available in most distros so installing
     'libtraceevent-devel' should be a one-time event to continue
     building perf as usual.

     Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
     sufficient for lots of users not interested in those libtraceevent
     dependent features.

   - Allow Python support in 'perf script' when libtraceevent isn't
     linked, as not all features requires it, for instance Intel PT does
     not use tracepoints.

   - Error if the python interpreter needed for jevents to work isn't
     available and NO_JEVENTS=1 isn't set, preventing a build without
     support for JSON vendor events, which is a rare but possible
     condition. The two check error messages:

        $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
        $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

   - Make libbpf 1.0 the minimum required when building with out of
     tree, distro provided libbpf.

   - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
     demangler, add 'perf test' entry for it.

   - Make binutils libraries opt in, as distros disable building with it
     due to licensing, they were used for C++ demangling, for instance.

   - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
     equivalent) isn't installed, we'll just have a build warning:

       Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

   - Add a feature test for scandirat(), that is not implemented so far
     in musl and uclibc, disabling features that need it, such as
     scanning for tracepoints in /sys/kernel/tracing/events.

  perf BPF filters:

   - New feature where BPF can be used to filter samples, for instance:

      $ sudo ./perf record -e cycles --filter 'period > 1000' true
      $ sudo ./perf script
           perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
           perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
           perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
                true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
                true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

   - In addition to 'period' (PERF_SAMPLE_PERIOD), the other
     PERF_SAMPLE_ can be used for filtering, and also some other sample
     accessible values, from tools/perf/Documentation/perf-record.txt:

        Essentially the BPF filter expression is:

        <term> <operator> <value> (("," | "||") <term> <operator> <value>)*

     The <term> can be one of:
        ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
        code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
        p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
        mem_dtlb, mem_blk, mem_hops

     The <operator> can be one of:
        ==, !=, >, >=, <, <=, &

     The <value> can be one of:
        <number> (for any term)
        na, load, store, pfetch, exec (for mem_op)
        l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
        na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
        remote (for mem_remote)
        na, locked (for mem_locked)
        na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
        na, by_data, by_addr (for mem_blk)
        hops0, hops1, hops2, hops3 (for mem_hops)

  perf lock contention:

   - Show lock type with address.

   - Track and show mmap_lock, siglock and per-cpu rq_lock with address.
     This is done for mmap_lock by following the current->mm pointer:

      $ sudo ./perf lock con -abl -- sleep 10
       contended   total wait     max wait     avg wait            address   symbol
       ...
           16344    312.30 ms      2.22 ms     19.11 us   ffff8cc702595640
           17686    310.08 ms      1.49 ms     17.53 us   ffff8cc7025952c0
               3     84.14 ms     45.79 ms     28.05 ms   ffff8cc78114c478   mmap_lock
            3557     76.80 ms     68.75 us     21.59 us   ffff8cc77ca3af58
               1     68.27 ms     68.27 ms     68.27 ms   ffff8cda745dfd70
               9     54.53 ms      7.96 ms      6.06 ms   ffff8cc7642a48b8   mmap_lock
           14629     44.01 ms     60.00 us      3.01 us   ffff8cc7625f9ca0
            3481     42.63 ms    140.71 us     12.24 us   ffffffff937906ac   vmap_area_lock
           16194     38.73 ms     42.15 us      2.39 us   ffff8cd397cbc560
              11     38.44 ms     10.39 ms      3.49 ms   ffff8ccd6d12fbb8   mmap_lock
               1      5.43 ms      5.43 ms      5.43 ms   ffff8cd70018f0d8
            1674      5.38 ms    422.93 us      3.21 us   ffffffff92e06080   tasklist_lock
             581      4.51 ms    130.68 us      7.75 us   ffff8cc9b1259058
               5      3.52 ms      1.27 ms    703.23 us   ffff8cc754510070
             112      3.47 ms     56.47 us     31.02 us   ffff8ccee38b3120
             381      3.31 ms     73.44 us      8.69 us   ffffffff93790690   purge_vmap_area_lock
             255      3.19 ms     36.35 us     12.49 us   ffff8d053ce30c80

   - Update default map size to 16384.

   - Allocate single letter option -M for --map-nr-entries, as it is
     proving being frequently used.

   - Fix struct rq lock access for older kernels with BPF's CO-RE
     (Compile once, run everywhere).

   - Fix problems found with MSAn.

  perf report/top:

   - Add inline information when using --call-graph=fp or lbr, as was
     already done to the --call-graph=dwarf callchain mode.

   - Improve the 'srcfile' sort key performance by really using an
     optimization introduced in 6.2 for the 'srcline' sort key that
     avoids calling addr2line for comparision with each sample.

  perf sched:

   - Make 'perf sched latency/map/replay' to use "sched:sched_waking"
     instead of "sched:sched_waking", consistent with 'perf record'
     since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
     exists").

  perf ftrace:

   - Make system wide the default target for latency subcommand, run the
     following command then generate some network traffic and press
     control+C:

       # perf ftrace latency -T __kfree_skb
     ^C
         DURATION     |      COUNT | GRAPH                                          |
          0 - 1    us |         27 | #############                                  |
          1 - 2    us |         22 | ###########                                    |
          2 - 4    us |          8 | ####                                           |
          4 - 8    us |          5 | ##                                             |
          8 - 16   us |         24 | ############                                   |
         16 - 32   us |          2 | #                                              |
         32 - 64   us |          1 |                                                |
         64 - 128  us |          0 |                                                |
        128 - 256  us |          0 |                                                |
        256 - 512  us |          0 |                                                |
        512 - 1024 us |          0 |                                                |
          1 - 2    ms |          0 |                                                |
          2 - 4    ms |          0 |                                                |
          4 - 8    ms |          0 |                                                |
          8 - 16   ms |          0 |                                                |
         16 - 32   ms |          0 |                                                |
         32 - 64   ms |          0 |                                                |
         64 - 128  ms |          0 |                                                |
        128 - 256  ms |          0 |                                                |
        256 - 512  ms |          0 |                                                |
        512 - 1024 ms |          0 |                                                |
          1 - ...   s |          0 |                                                |
       #

  perf top:

   - Add --branch-history (LBR: Last Branch Record) option, just like
     already available for 'perf record'.

   - Fix segfault in thread__comm_len() where thread->comm was being
     used outside thread->comm_lock.

  perf annotate:

   - Allow configuring objdump and addr2line in ~/.perfconfig., so that
     you can use alternative binaries, such as llvm's.

  perf kvm:

   - Add TUI mode for 'perf kvm stat report'.

  Reference counting:

   - Add reference count checking infrastructure to check for use after
     free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
     more to come.

     To build with it use -DREFCNT_CHECKING=1 in the make command line
     to build tools/perf. Documented at:

       https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

   - The above caught, for instance, fix, present in this series:

        - Fix maps use after put in 'perf test "Share thread maps"':

          'maps' is copied from leader, but the leader is put on line 79
          and then 'maps' is used to read the reference count below - so
          a use after put, with the put of maps happening within
          thread__put.

     Fixed by reversing the order of puts so that the leader is put
     last.

   - Also several fixes were made to places where reference counts were
     not being held.

   - Make this one of the tests in 'make -C tools/perf build-test' to
     regularly build test it and to make sure no direct access to the
     reference counted structs are made, doing that via accessors to
     check the validity of the struct pointer.

  ARM64:

   - Fix 'perf report' segfault when filtering coresight traces by
     sparse lists of CPUs.

   - Add support for 'simd' as a sort field for 'perf report', to show
     ARM's NEON SIMD's predicate flags: "partial" and "empty".

  arm64 vendor events:

   - Add N1 metrics.

  Intel vendor events:

   - Add graniterapids, grandridge and sierraforrest events.

   - Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
     broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
     jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
     silvermont, skylake, tigerlake and westmereep-dp

   - Refresh metrics for alderlake-n, broadwell, broadwellde,
     broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
     skylakex.

  perf stat:

   - Implement --topdown using JSON metrics.

   - Add TopdownL1 JSON metric as a default if present, but disable it
     for now for some Intel hybrid architectures, a series of patches
     addressing this is being reviewed and will be submitted for v6.5.

   - Use metrics for --smi-cost.

   - Update topdown documentation.

  Vendor events (JSON) infrastructure:

   - Add support for computing and printing metric threshold values. For
     instance, here is one found in thesapphirerapids json file:

       {
           "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
           "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
           "MetricGroup": "smi",
           "MetricName": "smi_cycles",
           "MetricThreshold": "smi_cycles > 0.1",
           "ScaleUnit": "100%"
       },

   - Test parsing metric thresholds with the fake PMU in 'perf test
     pmu-events'.

   - Support for printing metric thresholds in 'perf list'.

   - Add --metric-no-threshold option to 'perf stat'.

   - Add rand (reverse and) and has_pmem (optane memory) support to
     metrics.

   - Sort list of input files to avoid depending on the order from
     readdir() helping in obtaining reproducible builds.

  S/390:

   - Add common metrics: - CPI (cycles per instruction), prbstate (ratio
     of instructions executed in problem state compared to total number
     of instructions), l1mp (Level one instruction and data cache misses
     per 100 instructions).

   - Add cache metrics for z13, z14, z15 and z16.

   - Add metric for TLB and cache.

  ARM:

   - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
     (Memory Tagging Extension) and MOPS (Memory Operations) load/store.

  Intel PT hardware tracing:

   - Add event type names UINTR (User interrupt delivered) and UIRET
     (Exiting from user interrupt routine), documented in table 32-50
     "CFE Packet Type and Vector Fields Details" in the Intel Processor
     Trace chapter of The Intel SDM Volume 3 version 078.

   - Add support for new branch instructions ERETS and ERETU.

   - Fix CYC timestamps after standalone CBR

  ARM CoreSight hardware tracing:

   - Allow user to override timestamp and contextid settings.

   - Fix segfault in dso lookup.

   - Fix timeless decode mode detection.

   - Add separate decode paths for timeless and per-thread modes.

  auxtrace:

   - Fix address filter entire kernel size.

  Miscellaneous:

   - Fix use-after-free and unaligned bugs in the PLT handling routines.

   - Use zfree() to reduce chances of use after free.

   - Add missing 0x prefix for addresses printed in hexadecimal in 'perf
     probe'.

   - Suppress massive unsupported target platform errors in the unwind
     code.

   - Fix return incorrect build_id size in elf_read_build_id().

   - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

   - Add missing new parameter in kfree_skb tracepoint to the python
     scripts using it.

   - Add 'perf bench syscall fork' benchmark.

   - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
     'perf mem'.

   - Fix wrong size expectation for perf test 'Setup struct
     perf_event_attr' caused by the patch adding
     perf_event_attr::config3.

   - Fix some spelling mistakes"

* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
  Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
  Revert "perf build: Warn for BPF skeletons if endian mismatches"
  perf metrics: Fix SEGV with --for-each-cgroup
  perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
  perf stat: Separate bperf from bpf_profiler
  perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
  perf test record+probe_libc_inet_pton: Fix call chain match on s390
  perf tracepoint: Fix memory leak in is_valid_tracepoint()
  perf cs-etm: Add fix for coresight trace for any range of CPUs
  perf build: Fix unescaped # in perf build-test
  perf unwind: Suppress massive unsupported target platform errors
  perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
  perf script: Print raw ip instead of binary offset for callchain
  perf symbols: Fix return incorrect build_id size in elf_read_build_id()
  perf list: Modify the warning message about scandirat(3)
  perf list: Fix memory leaks in print_tracepoint_events()
  perf lock contention: Rework offset calculation with BPF CO-RE
  perf lock contention: Fix struct rq lock access
  perf stat: Disable TopdownL1 on hybrid
  perf stat: Avoid SEGV on counter->name
  ...
2023-05-07 11:32:18 -07:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Tiezhu Yang
ece7f7c050 perf bench syscall: Add fork syscall benchmark
This is a follow up patch for the execve bench which is actually
fork + execve, it makes sense to add the fork syscall benchmark
to compare the execve part precisely.

Some archs have no __NR_fork definition which is used only as a
check condition to call test_fork(), let us just define it as -1
to avoid build error.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/1679381821-22736-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:55 -03:00
Josh Poimboeuf
fb799447ae x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.

The problem is UNWIND_HINT_EMPTY is used in two different situations:

  1) The end of the kernel stack unwind before hitting user entry, boot
     code, or fork entry

  2) A blind spot in ORC coverage where the unwinder has to bail due to
     lack of information about the next frame

The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.

Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf
f902cfdd46 x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things.  Separate them
out more explicitly.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:57 +01:00
Linus Torvalds
49be4fb281 perf tools fixes for v6.3:
- Add Adrian Hunter to MAINTAINERS as a perf tools reviewer.
 
 - Sync various tools/ copies of kernel headers with the kernel sources, this
   time trying to avoid first merging with upstream to then update but instead
   copy from upstream so that a merge is avoided and the end result after merging
   this pull request is the one expected, tools/perf/check-headers.sh (mostly)
   happy, less warnings while building tools/perf/.
 
 - Fix counting when initial delay configured by setting
   perf_attr.enable_on_exec when starting workloads from the perf command line.
 
 - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject --buildid-all' when
   that record comes with a build-id, otherwise we end up not being able to
   resolve symbols.
 
 - Don't use comma as the CSV output separator the "stat+csv_output" test, as
   comma can appear on some tests as a modifier for an event, use @ instead,
   ditto for the JSON linter test.
 
 - The offcpu test was looking for some bits being set on
   task_struct->prev_state without masking other bits not important for this
   specific 'perf test', fix it.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZApKjQAKCRCyPKLppCJ+
 JzdfAQDRnwDCxhb4cvx7lVR32L1XMIFW6qLWRBJWoxC2SJi6lgD/SoQgKswkxrJv
 XnBP7jEaIsh3M3ak82MxLKbjSAEvnwk=
 =jup7
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Add Adrian Hunter to MAINTAINERS as a perf tools reviewer

 - Sync various tools/ copies of kernel headers with the kernel sources,
   this time trying to avoid first merging with upstream to then update
   but instead copy from upstream so that a merge is avoided and the end
   result after merging this pull request is the one expected,
   tools/perf/check-headers.sh (mostly) happy, less warnings while
   building tools/perf/

 - Fix counting when initial delay configured by setting
   perf_attr.enable_on_exec when starting workloads from the perf
   command line

 - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject
   --buildid-all' when that record comes with a build-id, otherwise we
   end up not being able to resolve symbols

 - Don't use comma as the CSV output separator the "stat+csv_output"
   test, as comma can appear on some tests as a modifier for an event,
   use @ instead, ditto for the JSON linter test

 - The offcpu test was looking for some bits being set on
   task_struct->prev_state without masking other bits not important for
   this specific 'perf test', fix it

* tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  tools headers x86 cpufeatures: Sync with the kernel sources
  tools include UAPI: Sync linux/vhost.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
  tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
  perf stat: Fix counting when initial delay configured
  tools headers svm: Sync svm headers with the kernel sources
  perf test: Avoid counting commas in json linter
  perf tests stat+csv_output: Switch CSV separator to @
  perf inject: Fix --buildid-all not to eat up MMAP2
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf test: Fix offcpu test prev_state check
2023-03-10 08:18:46 -08:00