Commit Graph

474 Commits

Author SHA1 Message Date
Fuad Tabba
d7396a72ea KVM: arm64: Make EL2 exception entry and exit context-synchronization events
SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
0487 M.b D24.2.175 (p. D24-9754):

  - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
    a CSE.
  - FEAT_ExS: the reset value is architecturally UNKNOWN; software
    must set the bit to make the entry/exit a CSE.

INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
architecturally backed unless EOS=1 (and, for entry, EIS=1).

Until commit 0a35bd285f ("arm64: Convert SCTLR_EL2 to sysreg
infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
was setting both unconditionally. The conversion made
SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
models unconditionally-RES1 fields and EIS/EOS are RES1 only when
FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
are the only bits whose semantics changed for FEAT_ExS hardware
and where the kernel relies on the value being 1.

Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
unconditionally CSEs regardless of whether FEAT_ExS is implemented.
This matches the pairing in arch/arm64/kvm/config.c which treats EIS
and EOS together as RES1 under !FEAT_ExS.

Fixes: 0a35bd285f ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07 14:12:41 +01:00
Marc Zyngier
b693940e81 Merge branch kvm-arm64/pkvm-psci into kvmarm-master/next
* kvm-arm64/pkvm-psci:
  : .
  : Cleanup of the pKVM PSCI relay CPU entry code, making it slightly
  : easier to follow, should someone have to wade into these waters
  : ever again.
  : .
  KVM: arm64: Remove extra ISBs when using msr_hcr_el2
  KVM: arm64: pkvm: Use direct function pointers for cpu_{on,resume}
  KVM: arm64: pkvm: Turn __kvm_hyp_init_cpu into an inner label
  KVM: arm64: pkvm: Simplify BTI handling on CPU boot
  KVM: arm64: pkvm: Move error handling to the end of kvm_hyp_cpu_entry

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:23:24 +01:00
Marc Zyngier
54a3cc1454 KVM: arm64: Remove extra ISBs when using msr_hcr_el2
The msr_hcr_el2 macro is slightly awkward, as it provides an ISB
when CONFIG_AMPERE_ERRATUM_AC04_CPU_23 is present, and none
otherwise. Note that this this option is 'default y', meaning that
it is likely to be selected.

Most instances of msr_hcr_el2 are also immediately followed by an ISB,
meaning that in most cases, you end-up with two back-to-back ISBs.
This isn't a big deal, but once you have seen that, you can't unsee it.

Rework the msr_hcr_el2 macro to always provide the ISB, and drop
the superfluous ISBs everywhere else.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260321212419.2803972-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-23 11:03:53 +00:00
Sascha Bischoff
59991153f0 arm64/sysreg: Add GICR CDNMIA encoding
The encoding for the GICR CDNMIA system instruction is thus far unused
(and shall remain unused for the time being). However, in order to
plumb the FGTs into KVM correctly, KVM needs to be made aware of the
encoding of this system instruction.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260319154937.3619520-8-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19 18:21:27 +00:00
Marc Zyngier
1df3f01ebf Merge branch kvm-arm64/resx into kvmarm-master/next
* kvm-arm64/resx:
  : .
  : Add infrastructure to deal with the full gamut of RESx bits
  : for NV. As a result, it is now possible to have the expected
  : semantics for some bits such as SCTLR_EL2.SPAN.
  : .
  KVM: arm64: Add debugfs file dumping computed RESx values
  KVM: arm64: Add sanitisation to SCTLR_EL2
  KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
  KVM: arm64: Remove all traces of FEAT_TME
  KVM: arm64: Simplify handling of full register invalid constraint
  KVM: arm64: Get rid of FIXED_VALUE altogether
  KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
  KVM: arm64: Move RESx into individual register descriptors
  KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
  KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
  KVM: arm64: Simplify FIXED_VALUE handling
  KVM: arm64: Convert HCR_EL2.RW to AS_RES1
  KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
  KVM: arm64: Allow RES1 bits to be inferred from configuration
  KVM: arm64: Inherit RESx bits from FGT register descriptors
  KVM: arm64: Extend unified RESx handling to runtime sanitisation
  KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
  KVM: arm64: Introduce standalone FGU computing primitive
  KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
  arm64: Convert SCTLR_EL2 to sysreg infrastructure

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:48 +00:00
Marc Zyngier
47e89febd3 Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/next
* kvm-arm64/gicv5-prologue:
  : .
  : Prologue to GICv5 support, courtesy of Sascha Bischoff.
  :
  : This is preliminary work that sets the scene for the full-blow
  : support.
  : .
  irqchip/gic-v5: Check if impl is virt capable
  KVM: arm64: gic: Set vgic_model before initing private IRQs
  arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
  KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:30 +00:00
Marc Zyngier
0a35bd285f arm64: Convert SCTLR_EL2 to sysreg infrastructure
Convert SCTLR_EL2 to the sysreg infrastructure, as per the 2025-12_rel
revision of the Registers.json file.

Note that we slightly deviate from the above, as we stick to the ARM
ARM M.a definition of SCTLR_EL2[9], which is RES0, in order to avoid
dragging the POE2 definitions...

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 08:59:28 +00:00
Sascha Bischoff
4a03431b74 KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
The VGIC-v3 code relied on hand-written definitions for the
ICH_VMCR_EL2 register. This register, and the associated fields, is
now generated as part of the sysreg framework. Move to using the
generated definitions instead of the hand-written ones.

There are no functional changes as part of this change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-3-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30 11:10:46 +00:00
Marc Zyngier
a035001dea arm64: Convert VTCR_EL2 to sysreg infratructure
Our definition of VTCR_EL2 is both partial (tons of fields are
missing) and totally inconsistent (some constants are shifted,
some are not). They are also expressed in terms of TCR, which is
rather inconvenient.

Replace the ad-hoc definitions with the the generated version.
This results in a bunch of additional changes to make the code
with the unshifted nature of generated enumerations.

The register data was extracted from the BSD licenced AARCHMRS
(AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 10:58:20 +00:00
Marc Zyngier
86364832ba KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit
We set PSTATE.PAN to 1 on exiting from a guest if PAN support has
been compiled in and that it exists on the HW. However, this is not
necessarily correct.

In a nVHE configuration, there is no notion of PAN at EL2, so setting
PSTATE.PAN to anything is pointless.

Furthermore, not setting PAN to 0 when CONFIG_ARM64_PAN isn't set
means we run with the *guest's* PSTATE.PAN (which might be set to 1),
and we will explode on the next userspace access. Yes, the architecture
is delightful in that particular corner.

Fix the whole thing by always setting PAN to something when running
VHE (which implies PAN support), and only ignore it when running nVHE.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20260107124600.2736328-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-09 15:43:14 -08:00
Linus Torvalds
44fc84337b arm64 updates for 6.19:
Core features:
 
  - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
    driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
 
 Perf and PMU:
 
  - Avoid cycle counter on multi-threaded CPUs
 
  - Extend CSPMU device probing and add additional filtering support for
    NVIDIA implementations
 
  - Add support for the PMUs on the NoC S3 interconnect
 
  - Add additional compatible strings for new Cortex and C1 CPUs
 
  - Add support for data source filtering to the SPE driver
 
  - Add support for i.MX8QM and "DB" PMU in the imx PMU driver
 
 Memory managemennt:
 
  - Avoid broadcast TLBI if page reused in write fault
 
  - Elide TLB invalidation if the old PTE was not valid
 
  - Drop redundant cpu_set_*_tcr_t0sz() macros
 
  - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
 
  - Propagate return value from __change_memory_common()
 
 ACPI and EFI:
 
  - Call EFI runtime services without disabling preemption
 
  - Remove unused ACPI function
 
 Miscellaneous:
 
  - ptrace support to disable streaming on SME-only systems
 
  - Improve sysreg generation to include a 'Prefix' descriptor
 
  - Replace __ASSEMBLY__ with __ASSEMBLER__
 
  - Align register dumps in the kselftest zt-test
 
  - Remove some no longer used macros/functions
 
  - Various spelling corrections
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI
 XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+
 61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6
 zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx
 +AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L
 zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj
 p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT
 +6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl
 jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj
 sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+
 bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r
 KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ=
 =4Sb6
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "These are the arm64 updates for 6.19.

  The biggest part is the Arm MPAM driver under drivers/resctrl/.
  There's a patch touching mm/ to handle spurious faults for huge pmd
  (similar to the pte version). The corresponding arm64 part allows us
  to avoid the TLB maintenance if a (huge) page is reused after a write
  fault. There's EFI refactoring to allow runtime services with
  preemption enabled and the rest is the usual perf/PMU updates and
  several cleanups/typos.

  Summary:

  Core features:

   - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
     driver under drivers/resctrl/ which makes use of the fs/rectrl/ API

  Perf and PMU:

   - Avoid cycle counter on multi-threaded CPUs

   - Extend CSPMU device probing and add additional filtering support
     for NVIDIA implementations

   - Add support for the PMUs on the NoC S3 interconnect

   - Add additional compatible strings for new Cortex and C1 CPUs

   - Add support for data source filtering to the SPE driver

   - Add support for i.MX8QM and "DB" PMU in the imx PMU driver

  Memory managemennt:

   - Avoid broadcast TLBI if page reused in write fault

   - Elide TLB invalidation if the old PTE was not valid

   - Drop redundant cpu_set_*_tcr_t0sz() macros

   - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()

   - Propagate return value from __change_memory_common()

  ACPI and EFI:

   - Call EFI runtime services without disabling preemption

   - Remove unused ACPI function

  Miscellaneous:

   - ptrace support to disable streaming on SME-only systems

   - Improve sysreg generation to include a 'Prefix' descriptor

   - Replace __ASSEMBLY__ with __ASSEMBLER__

   - Align register dumps in the kselftest zt-test

   - Remove some no longer used macros/functions

   - Various spelling corrections"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
  arm64/pageattr: Propagate return value from __change_memory_common
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  Documentation/arm64: Fix the typo of register names
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  ...
2025-12-02 17:03:55 -08:00
Catalin Marinas
52c4d1d624 Merge branch 'for-next/sysreg' into for-next/core
* for-next/sysreg:
  : arm64 sysreg updates/cleanups
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64/sysreg: Add ICH_VMCR_EL2
  arm64/sysreg: Move generation of RES0/RES1/UNKN to function
  arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
  arm64/sysreg: Fix checks for incomplete sysreg definitions
  arm64/sysreg: Replace TCR_EL1 field macros
2025-11-28 15:47:53 +00:00
Ben Horgan
27abb1ee5a arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
The define ARM64_FEATURE_FIELD_BITS is now unused and feature id
fields don't always have 4 bits. Remove it.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:17:59 +00:00
Thomas Huth
287d163322 arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize now
on the __ASSEMBLER__ macro that is provided by the compilers.

This is a mostly mechanical patch (done with a simple "sed -i"
statement), except for the following files where comments with
mis-spelled macros were tweaked manually:

 arch/arm64/include/asm/stacktrace/frame.h
 arch/arm64/include/asm/kvm_ptrauth.h
 arch/arm64/include/asm/debug-monitors.h
 arch/arm64/include/asm/esr.h
 arch/arm64/include/asm/scs.h
 arch/arm64/include/asm/memory.h

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:35:59 +00:00
Lorenzo Pieralisi
e9ad390a48 arm64/sysreg: Fix GIC CDEOI instruction encoding
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111
otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE.

Currenly, its usage is encoded as a system register write, with a constant
0 value:

write_sysreg_s(0, GICV5_OP_GIC_CDEOI)

While compiling with GCC, the 0 constant value, through these asm
constraints and modifiers ('x' modifier and 'Z' constraint combo):

asm volatile(__msr_s(r, "%x0") : : "rZ" (__val));

forces the compiler to issue the XZR register for the MSR operation (ie
that corresponds to Rt == 0b11111) issuing the right instruction encoding.

Unfortunately LLVM does not yet understand that modifier/constraint
combo so it ends up issuing a different register from XZR for the MSR
source, which in turns means that it encodes the GIC CDEOI instruction
wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE
that we must prevent.

Add a conditional to write_sysreg_s() macro that detects whether it
is passed a constant 0 value and issues an MSR write with XZR as source
register - explicitly doing what the asm modifier/constraint is meant to
achieve through constraints/modifiers, fixing the LLVM compilation issue.

Fixes: 7ec80fb3f0 ("irqchip/gic-v5: Add GICv5 PPI support")
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Sascha Bischoff <sascha.bischoff@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-17 17:55:01 +01:00
Linus Torvalds
feafee2845 arm64 updates for 6.18
Confidential computing:
  - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
    and mapping them with appropriate attributes.
 
 CPU features:
  - Advertise atomic floating-point instructions to userspace.
 
  - Extend Spectre workarounds to cover additional Arm CPU variants.
 
  - Extend list of CPUs that support break-before-make level 2 and
    guarantee not to generate TLB conflict aborts for changes of mapping
    granularity (BBML2_NOABORT).
 
  - Add GCS support to our uprobes implementation.
 
 Documentation:
  - Remove bogus SME documentation concerning register state when
    entering/exiting streaming mode.
 
 Entry code:
  - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY).
 
  - Micro-optimise syscall entry path with a compiler branch hint.
 
 Memory management:
  - Enable huge mappings in vmalloc space even when kernel page-table
    dumping is enabled.
 
  - Tidy up the types used in our early MMU setup code.
 
  - Rework rodata= for closer parity with the behaviour on x86.
 
  - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
    linear map even when rodata= applies to virtual aliases.
 
  - Don't re-allocate the virtual region between '_text' and '_stext',
    as doing so confused tools parsing /proc/vmcore.
 
 Miscellaneous:
  - Clean-up Kconfig menuconfig text for architecture features.
 
  - Avoid redundant bitmap_empty() during determination of supported
    SME vector lengths.
 
  - Re-enable warnings when building the 32-bit vDSO object.
 
  - Avoid breaking our eggs at the wrong end.
 
 Perf and PMUs:
  - Support for v3 of the Hisilicon L3C PMU.
 
  - Support for Hisilicon's MN and NoC PMUs.
 
  - Support for Fujitsu's Uncore PMU.
 
  - Support for SPE's extended event filtering feature.
 
  - Preparatory work to enable data source filtering in SPE.
 
  - Support for multiple lanes in the DWC PCIe PMU.
 
  - Support for i.MX94 in the IMX DDR PMU driver.
 
  - MAINTAINERS update (Thank you, Yicong).
 
  - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
 
 Selftests:
  - Add basic LSFE check to the existing hwcaps test.
 
  - Support nolibc in GCS tests.
 
  - Extend SVE ptrace test to pass unsupported regsets and invalid vector
    lengths.
 
  - Minor cleanups (typos, cosmetic changes).
 
 System registers:
  - Fix ID_PFR1_EL1 definition.
 
  - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1.
 
  - Sync TCR_EL1 definition with the latest Arm ARM (L.b).
 
  - Be stricter about the input fed into our AWK sysreg generator script.
 
  - Typo fixes and removal of redundant definitions.
 
 ACPI, EFI and PSCI:
  - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
    support from the ACPI GHES code so that it can be used by platforms
    booted with device-tree.
 
  - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
    runtime calls.
 
  - Fix a node refcount imbalance in the PSCI device-tree code.
 
 CPU Features:
  - Ensure register sanitisation is applied to fields in ID_AA64MMFR4.
 
  - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests
    can reliably query the underlying CPU types from the VMM.
 
  - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
    to our context-switching, signal handling and ptrace code.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmjWlMIQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIVYB/sHaFYmToEaK4ldMfTn4ZQ8yr9ZuTMLr/ji
 zOm7wULP08wKIcp6t+1N6e7A8Wb3YSErxTywc4b9MqctIel1WvBMewjJd38xb2qO
 hFCuRuWemfyt6Rw/EGMCP54yueLzKbnN4q+/Aks8jedWtS+AXY6McGCJOzMjuECL
 zKb68Hqqb09YQ2b2BPSAiU9g42rotiIYppLaLbEdxUnw/eqfEN3wSrl92/LuBlqH
 r2wZDnJwA5q8Iy9HWlnhg4Jax0Cs86jSJPIQLB6v3WWhc3HR49pm5cqYzYkUht9L
 nA6eaWJiBl/+k3S+ftPKNzU8n04r15lqyNZE2FYfRk+0BPSacJOf
 =kO15
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's good stuff across the board, including some nice mm
  improvements for CPUs with the 'noabort' BBML2 feature and a clever
  patch to allow ptdump to play nicely with block mappings in the
  vmalloc area.

  Confidential computing:

   - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
     and mapping them with appropriate attributes.

  CPU features:

   - Advertise atomic floating-point instructions to userspace

   - Extend Spectre workarounds to cover additional Arm CPU variants

   - Extend list of CPUs that support break-before-make level 2 and
     guarantee not to generate TLB conflict aborts for changes of
     mapping granularity (BBML2_NOABORT)

   - Add GCS support to our uprobes implementation.

  Documentation:

   - Remove bogus SME documentation concerning register state when
     entering/exiting streaming mode.

  Entry code:

   - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)

   - Micro-optimise syscall entry path with a compiler branch hint.

  Memory management:

   - Enable huge mappings in vmalloc space even when kernel page-table
     dumping is enabled

   - Tidy up the types used in our early MMU setup code

   - Rework rodata= for closer parity with the behaviour on x86

   - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
     linear map even when rodata= applies to virtual aliases

   - Don't re-allocate the virtual region between '_text' and '_stext',
     as doing so confused tools parsing /proc/vmcore.

  Miscellaneous:

   - Clean-up Kconfig menuconfig text for architecture features

   - Avoid redundant bitmap_empty() during determination of supported
     SME vector lengths

   - Re-enable warnings when building the 32-bit vDSO object

   - Avoid breaking our eggs at the wrong end.

  Perf and PMUs:

   - Support for v3 of the Hisilicon L3C PMU

   - Support for Hisilicon's MN and NoC PMUs

   - Support for Fujitsu's Uncore PMU

   - Support for SPE's extended event filtering feature

   - Preparatory work to enable data source filtering in SPE

   - Support for multiple lanes in the DWC PCIe PMU

   - Support for i.MX94 in the IMX DDR PMU driver

   - MAINTAINERS update (Thank you, Yicong)

   - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).

  Selftests:

   - Add basic LSFE check to the existing hwcaps test

   - Support nolibc in GCS tests

   - Extend SVE ptrace test to pass unsupported regsets and invalid
     vector lengths

   - Minor cleanups (typos, cosmetic changes).

  System registers:

   - Fix ID_PFR1_EL1 definition

   - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1

   - Sync TCR_EL1 definition with the latest Arm ARM (L.b)

   - Be stricter about the input fed into our AWK sysreg generator
     script

   - Typo fixes and removal of redundant definitions.

  ACPI, EFI and PSCI:

   - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
     support from the ACPI GHES code so that it can be used by platforms
     booted with device-tree

   - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
     runtime calls

   - Fix a node refcount imbalance in the PSCI device-tree code.

  CPU Features:

   - Ensure register sanitisation is applied to fields in ID_AA64MMFR4

   - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
     guests can reliably query the underlying CPU types from the VMM

   - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
     to our context-switching, signal handling and ptrace code"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: cpufeature: Remove duplicate asm/mmu.h header
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  perf/dwc_pcie: Fix use of uninitialized variable
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64/sysreg: Update TCR_EL1 register
  arm64: Enable vmalloc-huge with ptdump
  arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
  arm64: errata: Apply workarounds for Neoverse-V3AE
  arm64: cputype: Add Neoverse-V3AE definitions
  ...
2025-09-29 18:48:39 -07:00
Will Deacon
c7c7eb4f0e Merge branch 'for-next/sysregs' into for-next/core
* for-next/sysregs:
  arm64/sysreg: Update TCR_EL1 register
  arm64: sysreg: Add validation checks to sysreg header generation script
  arm64: sysreg: Correct sign definitions for EIESB and DoubleLock
  arm64: sysreg: Fix and tidy up sysreg field definitions
2025-09-24 16:35:01 +01:00
Anshuman Khandual
14f158552e arm64/sysreg: Update TCR_EL1 register
Update TCR_EL1 register fields as per latest ARM ARM DDI 0487 L.B and while
here drop an explicit sysreg definition SYS_TCR_EL1 from sysreg.h, which is
now redundant.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-22 11:57:15 +01:00
James Clark
b4401403af perf: arm_spe: Support FEAT_SPEv1p4 filters
FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits and also
makes some previously available bits unavailable again e.g:

  E[30], bit [30]
  When FEAT_SPEv1p4 is _not_ implemented ...

Continuing to hard code the valid filter bits for each version isn't
scalable, and it also doesn't work for filter bits that aren't related
to SPE version. For example most bits have a further condition:

  E[15], bit [15]
  When ... and filtering on event 15 is supported:

Whether "filtering on event 15" is implemented or not is only
discoverable from the TRM of that specific CPU or by probing
PMSEVFR_EL1.

Instead of hard coding them, write all 1s to the PMSEVFR_EL1 register
and read it back to discover the RES0 bits. Unsupported bits are RAZ/WI
so should read as 0s.

For any hardware that doesn't strictly follow RAZ/WI for unsupported
filters: Any bits that should have been supported in a specific SPE
version but now incorrectly appear to be RES0 wouldn't have worked
anyway, so it's better to fail to open events that request them rather
than behaving unexpectedly. Bits that aren't implemented but also aren't
RAZ/WI will be incorrectly reported as supported, but allowing them to
be used is harmless.

Testing on N1SDP shows the probed RES0 bits to be the same as the hard
coded ones. The FVP with SPEv1p4 shows only additional new RES0 bits,
i.e. no previously hard coded RES0 bits are missing.

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18 14:17:02 +01:00
Paolo Bonzini
42a0305ab1 KVM/arm64 changes for 6.17, take #2
- Correctly handle 'invariant' system registers for protected VMs
 
  - Improved handling of VNCR data aborts, including external aborts
 
  - Fixes for handling of FEAT_RAS for NV guests, providing a sane
    fault context during SEA injection and preventing the use of
    RASv1p1 fault injection hardware
 
  - Ensure that page table destruction when a VM is destroyed gives an
    opportunity to reschedule
 
  - Large fix to KVM's infrastructure for managing guest context loaded
    on the CPU, addressing issues where the output of AT emulation
    doesn't get reflected to the guest
 
  - Fix AT S12 emulation to actually perform stage-2 translation when
    necessary
 
  - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
    disabled for a VM
 
  - Minor KVM + selftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaLC0JBccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFogJAQCyxHd5tuvXWWT/iC2EYFlPWYkU
 LOQbNhus16QjQ9f2ggD8CoA+6UAxzYW7ZU6IzYkDhJkN/3dKQEQhh8Cx0GXXRAs=
 =uky+
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, take #2

 - Correctly handle 'invariant' system registers for protected VMs

 - Improved handling of VNCR data aborts, including external aborts

 - Fixes for handling of FEAT_RAS for NV guests, providing a sane
   fault context during SEA injection and preventing the use of
   RASv1p1 fault injection hardware

 - Ensure that page table destruction when a VM is destroyed gives an
   opportunity to reschedule

 - Large fix to KVM's infrastructure for managing guest context loaded
   on the CPU, addressing issues where the output of AT emulation
   doesn't get reflected to the guest

 - Fix AT S12 emulation to actually perform stage-2 translation when
   necessary

 - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
   disabled for a VM

 - Minor KVM + selftest fixes
2025-08-29 12:57:31 -04:00
Marc Zyngier
0843e0ced3 KVM: arm64: Get rid of ARM64_FEATURE_MASK()
The ARM64_FEATURE_MASK() macro was a hack introduce whilst the
automatic generation of sysreg encoding was introduced, and was
too unreliable to be entirely trusted.

We are in a better place now, and we could really do without this
macro. Get rid of it altogether.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250817202158.395078-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21 16:31:56 -07:00
Linus Torvalds
63eb28bb14 ARM:
- Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.
 
 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.
 
 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.
 
 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.
 
 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.
 
 - Convert more system register sanitization to the config-driven
   implementation.
 
 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.
 
 - Various cleanups and minor fixes.
 
 LoongArch:
 
 - Add stat information for in-kernel irqchip
 
 - Add tracepoints for CPUCFG and CSR emulation exits
 
 - Enhance in-kernel irqchip emulation
 
 - Various cleanups.
 
 RISC-V:
 
 - Enable ring-based dirty memory tracking
 
 - Improve perf kvm stat to report interrupt events
 
 - Delegate illegal instruction trap to VS-mode
 
 - MMU improvements related to upcoming nested virtualization
 
 s390x
 
 - Fixes
 
 x86:
 
 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC,
   PIC, and PIT emulation at compile time.
 
 - Share device posted IRQ code between SVM and VMX and
   harden it against bugs and runtime errors.
 
 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).
 
 - For MMIO stale data mitigation, track whether or not a vCPU has access to
   (host) MMIO based on whether the page tables have MMIO pfns mapped; using
   VFIO is prone to false negatives
 
 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.
 
 - Recalculate all MSR intercepts from scratch on MSR filter changes,
   instead of maintaining shadow bitmaps.
 
 - Advertise support for LKGS (Load Kernel GS base), a new instruction
   that's loosely related to FRED, but is supported and enumerated
   independently.
 
 - Fix a user-triggerable WARN that syzkaller found by setting the vCPU
   in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU
   into VMX Root Mode (post-VMXON).  Trying to detect every possible path
   leading to architecturally forbidden states is hard and even risks
   breaking userspace (if it goes from valid to valid state but passes
   through invalid states), so just wait until KVM_RUN to detect that
   the vCPU state isn't allowed.
 
 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can access
   APERF/MPERF.  This has many caveats (APERF/MPERF cannot be zeroed
   on vCPU creation or saved/restored on suspend and resume, or preserved
   over thread migration let alone VM migration) but can be useful whenever
   you're interested in letting Linux guests see the effective physical CPU
   frequency in /proc/cpuinfo.
 
 - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
   created, as there's no known use case for changing the default
   frequency for other VM types and it goes counter to the very reason
   why the ioctl was added to the vm file descriptor.  And also, there
   would be no way to make it work for confidential VMs with a "secure"
   TSC, so kill two birds with one stone.
 
 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU
   doesn't use the list).
 
 - Extract many of KVM's helpers for accessing architectural local APIC
   state to common x86 so that they can be shared by guest-side code for
   Secure AVIC.
 
 - Various cleanups and fixes.
 
 x86 (Intel):
 
 - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
   Failure to honor FREEZE_IN_SMM can leak host state into guests.
 
 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent
   L1 from running L2 with features that KVM doesn't support, e.g. BTF.
 
 x86 (AMD):
 
 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty
   much a static condition and therefore should never happen, but still).
 
 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
 
 - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.
 
 - Extend enable_ipiv module param support to SVM, by simply leaving
   IsRunning clear in the vCPU's physical ID table entry.
 
 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.
 
 - Request GA Log interrupts if and only if the target vCPU is blocking,
   i.e. only if KVM needs a notification in order to wake the vCPU.
 
 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.
 
 - Accept any SNP policy that is accepted by the firmware with respect to
   SMT and single-socket restrictions.  An incompatible policy doesn't put
   the kernel at risk in any way, so there's no reason for KVM to care.
 
 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
   use WBNOINVD instead of WBINVD when possible for SEV cache maintenance.
 
 - When reclaiming memory from an SEV guest, only do cache flushes on CPUs
   that have ever run a vCPU for the guest, i.e. don't flush the caches for
   CPUs that can't possibly have cache lines with dirty, encrypted data.
 
 Generic:
 
 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large
   numbers of VMs.  Such use cases typically don't actually use irqbypass,
   but eliminating the pointless registration is a future problem to
   solve as it likely requires new uAPI.
 
 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.
 
 - Decouple device posted IRQs from VFIO device assignment, as binding a VM
   to a VFIO group is not a requirement for enabling device posted IRQs.
 
 - Clean up and document/comment the irqfd assignment code.
 
 - Disallow binding multiple irqfds to an eventfd with a priority waiter,
   i.e.  ensure an eventfd is bound to at most one irqfd through the entire
   host, and add a selftest to verify eventfd:irqfd bindings are globally
   unique.
 
 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
   related to private <=> shared memory conversions.
 
 - Drop guest_memfd's .getattr() implementation as the VFS layer will call
   generic_fillattr() if inode_operations.getattr is NULL.
 
 - Fix issues with dirty ring harvesting where KVM doesn't bound the
   processing of entries in any way, which allows userspace to keep KVM
   in a tight loop indefinitely.
 
 - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking,
   now that KVM no longer uses assigned_device_count as a heuristic for
   either irqbypass usage or MDS mitigation.
 
 Selftests:
 
 - Fix a comment typo.
 
 - Verify KVM is loaded when getting any KVM module param so that attempting
   to run a selftest without kvm.ko loaded results in a SKIP message about
   KVM not being loaded/enabled (versus some random parameter not existing).
 
 - Skip tests that hit EACCES when attempting to access a file, and rpint
   a "Root required?" help message.  In most cases, the test just needs to
   be run with elevated permissions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU
 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm
 YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC
 j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN
 zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q
 IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw==
 =j2At
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Lorenzo Pieralisi
0f01013258 irqchip/gic-v5: Add GICv5 LPI/IPI support
An IRS supports Logical Peripheral Interrupts (LPIs) and implement
Linux IPIs on top of it.

LPIs are used for interrupt signals that are translated by a
GICv5 ITS (Interrupt Translation Service) but also for software
generated IRQs - namely interrupts that are not driven by a HW
signal, ie IPIs.

LPIs rely on memory storage for interrupt routing and state.

LPIs state and routing information is kept in the Interrupt
State Table (IST).

IRSes provide support for 1- or 2-level IST tables configured
to support a maximum number of interrupts that depend on the
OS configuration and the HW capabilities.

On systems that provide 2-level IST support, always allow
the maximum number of LPIs; On systems with only 1-level
support, limit the number of LPIs to 2^12 to prevent
wasting memory (presumably a system that supports a 1-level
only IST is not expecting a large number of interrupts).

On a 2-level IST system, L2 entries are allocated on
demand.

The IST table memory is allocated using the kmalloc() interface;
the allocation required may be smaller than a page and must be
made up of contiguous physical pages if larger than a page.

On systems where the IRS is not cache-coherent with the CPUs,
cache mainteinance operations are executed to clean and
invalidate the allocated memory to the point of coherency
making it visible to the IRS components.

On GICv5 systems, IPIs are implemented using LPIs.

Add an LPI IRQ domain and implement an IPI-specific IRQ domain created
as a child/subdomain of the LPI domain to allocate the required number
of LPIs needed to implement the IPIs.

IPIs are backed by LPIs, add LPIs allocation/de-allocation
functions.

The LPI INTID namespace is managed using an IDA to alloc/free LPI INTIDs.

Associate an IPI irqchip with IPI IRQ descriptors to provide
core code with the irqchip.ipi_send_single() method required
to raise an IPI.

Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Co-developed-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-22-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00
Lorenzo Pieralisi
5cb1b6dab2 irqchip/gic-v5: Add GICv5 IRS/SPI support
The GICv5 Interrupt Routing Service (IRS) component implements
interrupt management and routing in the GICv5 architecture.

A GICv5 system comprises one or more IRSes, that together
handle the interrupt routing and state for the system.

An IRS supports Shared Peripheral Interrupts (SPIs), that are
interrupt sources directly connected to the IRS; they do not
rely on memory for storage. The number of supported SPIs is
fixed for a given implementation and can be probed through IRS
IDR registers.

SPI interrupt state and routing are managed through GICv5
instructions.

Each core (PE in GICv5 terms) in a GICv5 system is identified with
an Interrupt AFFinity ID (IAFFID).

An IRS manages a set of cores that are connected to it.

Firmware provides a topology description that the driver uses
to detect to which IRS a CPU (ie an IAFFID) is associated with.

Use probeable information and firmware description to initialize
the IRSes and implement GICv5 IRS SPIs support through an
SPI-specific IRQ domain.

The GICv5 IRS driver:

- Probes IRSes in the system to detect SPI ranges
- Associates an IRS with a set of cores connected to it
- Adds an IRQchip structure for SPI handling

SPIs priority is set to a value corresponding to the lowest
permissible priority in the system (taking into account the
implemented priority bits of the IRS and CPU interface).

Since all IRQs are set to the same priority value, the value
itself does not matter as long as it is a valid one.

Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Co-developed-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-21-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00
Lorenzo Pieralisi
7ec80fb3f0 irqchip/gic-v5: Add GICv5 PPI support
The GICv5 CPU interface implements support for PE-Private Peripheral
Interrupts (PPI), that are handled (enabled/prioritized/delivered)
entirely within the CPU interface hardware.

To enable PPI interrupts, implement the baseline GICv5 host kernel
driver infrastructure required to handle interrupts on a GICv5 system.

Add the exception handling code path and definitions for GICv5
instructions.

Add GICv5 PPI handling code as a specific IRQ domain to:

- Set-up PPI priority
- Manage PPI configuration and state
- Manage IRQ flow handler
- IRQs allocation/free
- Hook-up a PPI specific IRQchip to provide the relevant methods

PPI IRQ priority is chosen as the minimum allowed priority by the
system design (after probing the number of priority bits implemented
by the CPU interface).

Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Co-developed-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-20-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00
Lorenzo Pieralisi
e62e1e9493 arm64: Add support for GICv5 GSB barriers
The GICv5 architecture introduces two barriers instructions
(GSB SYS, GSB ACK) that are used to manage interrupt effects.

Rework macro used to emit the SB barrier instruction and implement
the GSB barriers on top of it.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-19-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00
Anshuman Khandual
52e4a56ab8 arm64/sysreg: Add BRBE registers and fields
This patch adds definitions related to the Branch Record Buffer Extension
(BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver
in subsequent patches.

Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent
generated definitions.

Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
tested-by: Adam Young <admiyo@os.amperecomputing.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-1-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08 16:11:27 +01:00
Marc Zyngier
94d889713d arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
Broonie reports that fed55f49fa ("arm64: errata: Work around
AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests
(vdso_test_chacha) as it indirectly drags asm/sysreg.h.

It is rather unfortunate (and worrying) that userspace gets built
with non-UAPI headers. In any case, paper over the issue by dragging
linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at
least from the kernel perspective.

Reported-by: Mark Brown <broonie@kernel.org>
Fixes: fed55f49fa ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23")
Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30 09:09:01 +01:00
Marc Zyngier
1b85d923ba Merge branch kvm-arm64/misc-6.16 into kvmarm-master/next
* kvm-arm64/misc-6.16:
  : .
  : Misc changes and improvements for 6.16:
  :
  : - Add a new selftest for the SVE host state being corrupted by a guest
  :
  : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2,
  :   ensuring that the window for interrupts is slightly bigger, and avoiding
  :   a pretty bad erratum on the AmpereOne HW
  :
  : - Replace a couple of open-coded on/off strings with str_on_off()
  :
  : - Get rid of the pKVM memblock sorting, which now appears to be superflous
  :
  : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting
  :
  : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from
  :   a pretty bad case of TLB corruption unless accesses to HCR_EL2 are
  :   heavily synchronised
  :
  : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables
  :   in a human-friendly fashion
  : .
  KVM: arm64: Fix documentation for vgic_its_iter_next()
  KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables
  arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
  KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1
  KVM: arm64: Drop sort_memblock_regions()
  KVM: arm64: selftests: Add test for SVE host corruption
  KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
  KVM: arm64: Replace ternary flags with str_on_off() helper

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:59:43 +01:00
Marc Zyngier
7f3225fe8b Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
  : .
  : Flick the switch on the NV support by adding the missing piece
  : in the form of the VNCR page management. From the cover letter:
  :
  : "This is probably the most interesting bit of the whole NV adventure.
  : So far, everything else has been a walk in the park, but this one is
  : where the real fun takes place.
  :
  : With FEAT_NV2, most of the NV support revolves around tricking a guest
  : into accessing memory while it tries to access system registers. The
  : hypervisor's job is to handle the context switch of the actual
  : registers with the state in memory as needed."
  : .
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  KVM: arm64: Document NV caps and vcpu flags
  KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
  KVM: arm64: nv: Remove dead code from ERET handling
  KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
  KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
  KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
  KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
  KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults
  KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
  KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
  KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
  KVM: arm64: nv: Move TLBI range decoding to a helper
  KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
  KVM: arm64: nv: Extract translation helper from the AT code
  KVM: arm64: nv: Allocate VNCR page when required
  arm64: sysreg: Add layout for VNCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:58:57 +01:00
D Scott Phillips
fed55f49fa arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous
translations for data addresses initiated by load/store instructions.
Only instruction initiated translations are vulnerable, not translations
from prefetches for example. A DSB before the store to HCR_EL2 is
sufficient to prevent older instructions from hitting the window for
corruption, and an ISB after is sufficient to prevent younger
instructions from hitting the window for corruption.

Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 12:46:26 +01:00
Marc Zyngier
fb3066904a arm64: sysreg: Add layout for VNCR_EL2
Now that we're about to emulate VNCR_EL2, we need its full layout.
Add it to the sysreg file.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
7a11d98d6e arm64: Remove duplicated sysreg encodings
A bunch of sysregs are now generated from the sysreg file, so no
need to carry separate definitions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06 17:35:03 +01:00
Marc Zyngier
7c9cb893ae arm64: sysreg: Add system instructions trapped by HFGIRT2_EL2
Add the new CMOs trapped by HFGITR2_EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06 17:35:03 +01:00
Marc Zyngier
dd161dc2df arm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2
Bulk addition of all the system registers trapped by HDFG{R,W}TR2_EL2.

The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06 17:35:03 +01:00
Linus Torvalds
edb0e8f6e2 ARM:
* Nested virtualization support for VGICv3, giving the nested
 hypervisor control of the VGIC hardware when running an L2 VM
 
 * Removal of 'late' nested virtualization feature register masking,
   making the supported feature set directly visible to userspace
 
 * Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
   of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
 
 * Paravirtual interface for discovering the set of CPU implementations
   where a VM may run, addressing a longstanding issue of guest CPU
   errata awareness in big-little systems and cross-implementation VM
   migration
 
 * Userspace control of the registers responsible for identifying a
   particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
   allowing VMs to be migrated cross-implementation
 
 * pKVM updates, including support for tracking stage-2 page table
   allocations in the protected hypervisor in the 'SecPageTable' stat
 
 * Fixes to vPMU, ensuring that userspace updates to the vPMU after
   KVM_RUN are reflected into the backing perf events
 
 LoongArch:
 
 * Remove unnecessary header include path
 
 * Assume constant PGD during VM context switch
 
 * Add perf events support for guest VM
 
 RISC-V:
 
 * Disable the kernel perf counter during configure
 
 * KVM selftests improvements for PMU
 
 * Fix warning at the time of KVM module removal
 
 x86:
 
 * Add support for aging of SPTEs without holding mmu_lock.  Not taking mmu_lock
   allows multiple aging actions to run in parallel, and more importantly avoids
   stalling vCPUs.  This includes an implementation of per-rmap-entry locking;
   aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas
   locking an rmap for write requires taking both the per-rmap spinlock and
   the mmu_lock.
 
   Note that this decreases slightly the accuracy of accessed-page information,
   because changes to the SPTE outside aging might not use atomic operations
   even if they could race against a clear of the Accessed bit.  This is
   deliberate because KVM and mm/ tolerate false positives/negatives for
   accessed information, and testing has shown that reducing the latency of
   aging is far more beneficial to overall system performance than providing
   "perfect" young/old information.
 
 * Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
   coalesce updates when multiple pieces of vCPU state are changing, e.g. as
   part of a nested transition.
 
 * Fix a variety of nested emulation bugs, and add VMX support for synthesizing
   nested VM-Exit on interception (instead of injecting #UD into L2).
 
 * Drop "support" for async page faults for protected guests that do not set
   SEND_ALWAYS (i.e. that only want async page faults at CPL3)
 
 * Bring a bit of sanity to x86's VM teardown code, which has accumulated
   a lot of cruft over the years.  Particularly, destroy vCPUs before
   the MMU, despite the latter being a VM-wide operation.
 
 * Add common secure TSC infrastructure for use within SNP and in the
   future TDX
 
 * Block KVM_CAP_SYNC_REGS if guest state is protected.  It does not make
   sense to use the capability if the relevant registers are not
   available for reading or writing.
 
 * Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
   fix a largely theoretical deadlock.
 
 * Use the vCPU's actual Xen PV clock information when starting the Xen timer,
   as the cached state in arch.hv_clock can be stale/bogus.
 
 * Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
   PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend
   notifier only accounts for kvmclock, and there's no evidence that the
   flag is actually supported by Xen guests.
 
 * Clean up the per-vCPU "cache" of its reference pvclock, and instead only
   track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
   expensive to compute, and rarely changes for modern setups).
 
 * Don't write to the Xen hypercall page on MSR writes that are initiated by
   the host (userspace or KVM) to fix a class of bugs where KVM can write to
   guest memory at unexpected times, e.g. during vCPU creation if userspace has
   set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
 
 * Restrict the Xen hypercall MSR index to the unofficial synthetic range to
   reduce the set of possible collisions with MSRs that are emulated by KVM
   (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
   in the synthetic range).
 
 * Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
 
 * Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
   entries when updating PV clocks; there is no guarantee PV clocks will be
   updated between TSC frequency changes and CPUID emulation, and guest reads
   of the TSC leaves should be rare, i.e. are not a hot path.
 
 x86 (Intel):
 
 * Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
   modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
 
 * Pass XFD_ERR as the payload when injecting #NM, as a preparatory step
   for upcoming FRED virtualization support.
 
 * Decouple the EPT entry RWX protection bit macros from the EPT Violation
   bits, both as a general cleanup and in anticipation of adding support for
   emulating Mode-Based Execution Control (MBEC).
 
 * Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
   state while KVM is in the middle of emulating nested VM-Enter.
 
 * Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
   in anticipation of adding sanity checks for secondary exit controls (the
   primary field is out of bits).
 
 x86 (AMD):
 
 * Ensure the PSP driver is initialized when both the PSP and KVM modules are
   built-in (the initcall framework doesn't handle dependencies).
 
 * Use long-term pins when registering encrypted memory regions, so that the
   pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
   excessive fragmentation.
 
 * Add macros and helpers for setting GHCB return/error codes.
 
 * Add support for Idle HLT interception, which elides interception if the vCPU
   has a pending, unmasked virtual IRQ when HLT is executed.
 
 * Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
   address.
 
 * Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
   because the vCPU was "destroyed" via SNP's AP Creation hypercall.
 
 * Reject SNP AP Creation if the requested SEV features for the vCPU don't
   match the VM's configured set of features.
 
 Selftests:
 
 * Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data
   instead of executing code.  The theory is that modern Intel CPUs have
   learned new code prefetching tricks that bypass the PMU counters.
 
 * Fix a flaw in the Intel PMU counters test where it asserts that an event is
   counting correctly without actually knowing what the event counts on the
   underlying hardware.
 
 * Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
   improve its coverage by collecting all dirty entries on each iteration.
 
 * Fix a few minor bugs related to handling of stats FDs.
 
 * Add infrastructure to make vCPU and VM stats FDs available to tests by
   default (open the FDs during VM/vCPU creation).
 
 * Relax an assertion on the number of HLT exits in the xAPIC IPI test when
   running on a CPU that supports AMD's Idle HLT (which elides interception of
   HLT if a virtual IRQ is pending and unmasked).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfcTkEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMnQAf/cPx72hJOdNy4Qrm8M33YLXVRVV00
 yEZ8eN8TWdOclr0ltE/w/ELGh/qS4CU8pjURAk0A6lPioU+mdcTn3dPEqMDMVYom
 uOQ2lusEHw0UuSnGZSEjvZJsE/Ro2NSAsHIB6PWRqig1ZBPJzyu0frce34pMpeQH
 diwriJL9lKPAhBWXnUQ9BKoi1R0P5OLW9ahX4SOWk7cAFg4DLlDE66Nqf6nKqViw
 DwEucTiUEg5+a3d93gihdD4JNl+fb3vI2erxrMxjFjkacl0qgqRu3ei3DG0MfdHU
 wNcFSG5B1n0OECKxr80lr1Ip1KTVNNij0Ks+w6Gc6lSg9c4PptnNkfLK3A==
 =nnCN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Nested virtualization support for VGICv3, giving the nested
     hypervisor control of the VGIC hardware when running an L2 VM

   - Removal of 'late' nested virtualization feature register masking,
     making the supported feature set directly visible to userspace

   - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
     of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

   - Paravirtual interface for discovering the set of CPU
     implementations where a VM may run, addressing a longstanding issue
     of guest CPU errata awareness in big-little systems and
     cross-implementation VM migration

   - Userspace control of the registers responsible for identifying a
     particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
     allowing VMs to be migrated cross-implementation

   - pKVM updates, including support for tracking stage-2 page table
     allocations in the protected hypervisor in the 'SecPageTable' stat

   - Fixes to vPMU, ensuring that userspace updates to the vPMU after
     KVM_RUN are reflected into the backing perf events

  LoongArch:

   - Remove unnecessary header include path

   - Assume constant PGD during VM context switch

   - Add perf events support for guest VM

  RISC-V:

   - Disable the kernel perf counter during configure

   - KVM selftests improvements for PMU

   - Fix warning at the time of KVM module removal

  x86:

   - Add support for aging of SPTEs without holding mmu_lock.

     Not taking mmu_lock allows multiple aging actions to run in
     parallel, and more importantly avoids stalling vCPUs. This includes
     an implementation of per-rmap-entry locking; aging the gfn is done
     with only a per-rmap single-bin spinlock taken, whereas locking an
     rmap for write requires taking both the per-rmap spinlock and the
     mmu_lock.

     Note that this decreases slightly the accuracy of accessed-page
     information, because changes to the SPTE outside aging might not
     use atomic operations even if they could race against a clear of
     the Accessed bit.

     This is deliberate because KVM and mm/ tolerate false
     positives/negatives for accessed information, and testing has shown
     that reducing the latency of aging is far more beneficial to
     overall system performance than providing "perfect" young/old
     information.

   - Defer runtime CPUID updates until KVM emulates a CPUID instruction,
     to coalesce updates when multiple pieces of vCPU state are
     changing, e.g. as part of a nested transition

   - Fix a variety of nested emulation bugs, and add VMX support for
     synthesizing nested VM-Exit on interception (instead of injecting
     #UD into L2)

   - Drop "support" for async page faults for protected guests that do
     not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

   - Bring a bit of sanity to x86's VM teardown code, which has
     accumulated a lot of cruft over the years. Particularly, destroy
     vCPUs before the MMU, despite the latter being a VM-wide operation

   - Add common secure TSC infrastructure for use within SNP and in the
     future TDX

   - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
     make sense to use the capability if the relevant registers are not
     available for reading or writing

   - Don't take kvm->lock when iterating over vCPUs in the suspend
     notifier to fix a largely theoretical deadlock

   - Use the vCPU's actual Xen PV clock information when starting the
     Xen timer, as the cached state in arch.hv_clock can be stale/bogus

   - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
     different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
     KVM's suspend notifier only accounts for kvmclock, and there's no
     evidence that the flag is actually supported by Xen guests

   - Clean up the per-vCPU "cache" of its reference pvclock, and instead
     only track the vCPU's TSC scaling (multipler+shift) metadata (which
     is moderately expensive to compute, and rarely changes for modern
     setups)

   - Don't write to the Xen hypercall page on MSR writes that are
     initiated by the host (userspace or KVM) to fix a class of bugs
     where KVM can write to guest memory at unexpected times, e.g.
     during vCPU creation if userspace has set the Xen hypercall MSR
     index to collide with an MSR that KVM emulates

   - Restrict the Xen hypercall MSR index to the unofficial synthetic
     range to reduce the set of possible collisions with MSRs that are
     emulated by KVM (collisions can still happen as KVM emulates
     Hyper-V MSRs, which also reside in the synthetic range)

   - Clean up and optimize KVM's handling of Xen MSR writes and
     xen_hvm_config

   - Update Xen TSC leaves during CPUID emulation instead of modifying
     the CPUID entries when updating PV clocks; there is no guarantee PV
     clocks will be updated between TSC frequency changes and CPUID
     emulation, and guest reads of the TSC leaves should be rare, i.e.
     are not a hot path

  x86 (Intel):

   - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
     thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

   - Pass XFD_ERR as the payload when injecting #NM, as a preparatory
     step for upcoming FRED virtualization support

   - Decouple the EPT entry RWX protection bit macros from the EPT
     Violation bits, both as a general cleanup and in anticipation of
     adding support for emulating Mode-Based Execution Control (MBEC)

   - Reject KVM_RUN if userspace manages to gain control and stuff
     invalid guest state while KVM is in the middle of emulating nested
     VM-Enter

   - Add a macro to handle KVM's sanity checks on entry/exit VMCS
     control pairs in anticipation of adding sanity checks for secondary
     exit controls (the primary field is out of bits)

  x86 (AMD):

   - Ensure the PSP driver is initialized when both the PSP and KVM
     modules are built-in (the initcall framework doesn't handle
     dependencies)

   - Use long-term pins when registering encrypted memory regions, so
     that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
     don't lead to excessive fragmentation

   - Add macros and helpers for setting GHCB return/error codes

   - Add support for Idle HLT interception, which elides interception if
     the vCPU has a pending, unmasked virtual IRQ when HLT is executed

   - Fix a bug in INVPCID emulation where KVM fails to check for a
     non-canonical address

   - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
     invalid, e.g. because the vCPU was "destroyed" via SNP's AP
     Creation hypercall

   - Reject SNP AP Creation if the requested SEV features for the vCPU
     don't match the VM's configured set of features

  Selftests:

   - Fix again the Intel PMU counters test; add a data load and do
     CLFLUSH{OPT} on the data instead of executing code. The theory is
     that modern Intel CPUs have learned new code prefetching tricks
     that bypass the PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that an
     event is counting correctly without actually knowing what the event
     counts on the underlying hardware

   - Fix a variety of flaws, bugs, and false failures/passes
     dirty_log_test, and improve its coverage by collecting all dirty
     entries on each iteration

   - Fix a few minor bugs related to handling of stats FDs

   - Add infrastructure to make vCPU and VM stats FDs available to tests
     by default (open the FDs during VM/vCPU creation)

   - Relax an assertion on the number of HLT exits in the xAPIC IPI test
     when running on a CPU that supports AMD's Idle HLT (which elides
     interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
  RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
  RISC-V: KVM: Teardown riscv specific bits after kvm_exit
  LoongArch: KVM: Register perf callbacks for guest
  LoongArch: KVM: Implement arch-specific functions for guest perf
  LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
  LoongArch: KVM: Remove PGD saving during VM context switch
  LoongArch: KVM: Remove unnecessary header include path
  KVM: arm64: Tear down vGIC on failed vCPU creation
  KVM: arm64: PMU: Reload when resetting
  KVM: arm64: PMU: Reload when user modifies registers
  KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
  KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
  KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
  KVM: x86: Add infrastructure for secure TSC
  KVM: x86: Push down setting vcpu.arch.user_set_tsc
  ...
2025-03-25 14:22:07 -07:00
Kevin Brodsky
650701e4ea arm64/sysreg: Move POR_EL0_INIT to asm/por.h
The value of POR_EL0_INIT is not architectural, it is a software
decision. Since we have a dedicated header for POR_ELx, we might as
well define POR_EL0_INIT there.

While at it also define POR_EL0_INIT using POR_ELx_PERM_PREP(),
making it clearer that we are setting permissions for POIndex/pkey
0.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-4-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-10 18:10:41 +00:00
Kevin Brodsky
83d78bbfd2 arm64/sysreg: Rename POE_RXW to POE_RWX
It is customary to list R, W, X permissions in that order. In fact
this is already the case for PIE constants (PIE_RWX). Rename POE_RXW
accordingly, as well as POE_XW (currently unused).

While at it also swap the W/X lines in
compute_s1_overlay_permissions() to follow the R, W, X order.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-3-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-10 18:10:40 +00:00
Kevin Brodsky
f91a3a6088 arm64/sysreg: Improve PIR/POR helpers
We currently have one helper to set a PIRx_ELx's permission field to
a given value, PIRx_ELx_PERM(), and another helper to extract a
permission field from POR_ELx, POR_ELx_IDX(). The naming is pretty
confusing - it isn't clear at all that "_PERM" corresponds to a
setter and "_IDX" to a getter.

This patch aims at improving the situation by using the same
suffixes as FIELD_PREP()/FIELD_GET(), which we have already adopted
for SYS_FIELD_{PREP,GET}():

* PIRx_ELx_PERM_PREP(), POR_ELx_PERM_PREP() create a register value
  where the permission field for a given index is set to a given value.

* POR_ELx_PERM_GET() extracts the permission field from a given
  register value for a given index.

These helpers are not implemented using FIELD_PREP()/FIELD_GET()
because the mask may not be constant, and they need to be usable in
assembly. They are all defined in asm/sysreg.h, as one would expect
for basic sysreg-related helpers.

Finally the new POR_ELx_PERM_* macros are used for existing
calculations in signal.c and mmu.c.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-2-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-10 18:10:40 +00:00
Marc Zyngier
b7a252e881 arm64: sysreg: Add layout for ICH_MISR_EL2
The ICH_MISR_EL2-related macros are missing a number of status
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03 14:51:51 -08:00
Marc Zyngier
5815fb82dc arm64: sysreg: Add layout for ICH_VTR_EL2
The ICH_VTR_EL2-related macros are missing a number of config
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.

This results in a bit of churn to repaint constants that are now
generated with a different format.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03 14:51:51 -08:00
Marc Zyngier
22513c0d2a arm64: sysreg: Add layout for ICH_HCR_EL2
The ICH_HCR_EL2-related macros are missing a number of control
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.

This results in a bit of churn, unfortunately.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03 14:51:51 -08:00
Marc Zyngier
946904e728 Merge branch kvm-arm64/coresight-6.14 into kvmarm-master/next
* kvm-arm64/coresight-6.14:
  : .
  : Trace filtering update from James Clark. From the cover letter:
  :
  : "The guest filtering rules from the Perf session are now honored for both
  : nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
  : start of the Perf session and doing nothing else further, or caching the
  : guest value and writing it at guest switch for nVHE. In pKVM, trace is
  : now be disabled for both protected and unprotected guests."
  : .
  KVM: arm64: Fix selftests after sysreg field name update
  coresight: Pass guest TRFCR value to KVM
  KVM: arm64: Support trace filtering for guests
  KVM: arm64: coresight: Give TRBE enabled state to KVM
  coresight: trbe: Remove redundant disable call
  arm64/sysreg/tools: Move TRFCR definitions to sysreg
  tools: arm64: Update sysreg.h header files

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17 11:05:44 +00:00
James Clark
c382ee674c arm64/sysreg/tools: Move TRFCR definitions to sysreg
Convert TRFCR to automatic generation. Add separate definitions for ELx
and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
definition so no code change is required.

Also add TRFCR_EL12 which will start to be used in a later commit.

Unfortunately, to avoid breaking the Perf build with duplicate
definition errors, the tools copy of the sysreg.h header needs to be
updated at the same time rather than the usual second commit. This is
because the generated version of sysreg
(arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared
and tools/ does not have its own copy.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12 12:50:11 +00:00
Marc Zyngier
b59dbb91f7 KVM: arm64: nv: Add handling of EL2-specific timer registers
Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context. This includes
handling the virtual timer accesses when CNTHCTL_EL2.EL1TVT
or CNTHCTL_EL2.EL1TVCT are set.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02 19:19:09 +00:00
Linus Torvalds
9f16d5e6f2 The biggest change here is eliminating the awful idea that KVM had, of
essentially guessing which pfns are refcounted pages.  The reason to
 do so was that KVM needs to map both non-refcounted pages (for example
 BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
 refcounted pages.  However, the result was security issues in the past,
 and more recently the inability to map VM_IO and VM_PFNMAP memory
 that _is_ backed by struct page but is not refcounted.  In particular
 this broke virtio-gpu blob resources (which directly map host graphics
 buffers into the guest as "vram" for the virtio-gpu device) with the
 amdgpu driver, because amdgpu allocates non-compound higher order pages
 and the tail pages could not be mapped into KVM.
 
 This requires adjusting all uses of struct page in the per-architecture
 code, to always work on the pfn whenever possible.  The large series that
 did this, from David Stevens and Sean Christopherson, also cleaned up
 substantially the set of functions that provided arch code with the
 pfn for a host virtual addresses.  The previous maze of twisty little
 passages, all different, is replaced by five functions (__gfn_to_page,
 __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
 saving almost 200 lines of code.
 
 ARM:
 
 * Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker
 
 * Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI
 
 * Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps
 
 * PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest
 
 * Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM
 
 * Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection
 
 * Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
 
 LoongArch:
 
 * Add iocsr and mmio bus simulation in kernel.
 
 * Add in-kernel interrupt controller emulation.
 
 * Add support for virtualization extensions to the eiointc irqchip.
 
 PPC:
 
 * Drop lingering and utterly obsolete references to PPC970 KVM, which was
   removed 10 years ago.
 
 * Fix incorrect documentation references to non-existing ioctls
 
 RISC-V:
 
 * Accelerate KVM RISC-V when running as a guest
 
 * Perf support to collect KVM guest statistics from host side
 
 s390:
 
 * New selftests: more ucontrol selftests and CPU model sanity checks
 
 * Support for the gen17 CPU model
 
 * List registers supported by KVM_GET/SET_ONE_REG in the documentation
 
 x86:
 
 * Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes.  Even if the hardware
   A/D tracking is disabled, it is possible to use the hardware-defined A/D
   bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
   of special cases.
 
 * Elide TLB flushes when aging secondary PTEs, as has been done in x86's
   primary MMU for over 10 years.
 
 * Recover huge pages in-place in the TDP MMU when dirty page logging is
   toggled off, instead of zapping them and waiting until the page is
   re-accessed to create a huge mapping.  This reduces vCPU jitter.
 
 * Batch TLB flushes when dirty page logging is toggled off.  This reduces
   the time it takes to disable dirty logging by ~3x.
 
 * Remove the shrinker that was (poorly) attempting to reclaim shadow page
   tables in low-memory situations.
 
 * Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
 
 * Advertise CPUIDs for new instructions in Clearwater Forest
 
 * Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which in turn can lead to save/restore failures.
 
 * Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.
 
 * Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe; harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.  The issue that triggered this change was already fixed in 6.12,
   but was still kinda latent.
 
 * Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
 
 * Minor cleanups
 
 * Switch hugepage recovery thread to use vhost_task.  These kthreads can
   consume significant amounts of CPU time on behalf of a VM or in response
   to how the VM behaves (for example how it accesses its memory); therefore
   KVM tried to place the thread in the VM's cgroups and charge the CPU
   time consumed by that work to the VM's container.  However the kthreads
   did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
   instances inside could not complete freezing.  Fix this by replacing the
   kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
   Another 100+ lines removed, with generally better behavior too like
   having these threads properly parented in the process tree.
 
 * Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
   really work; there was really nothing to work around anyway: the broken
   patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
   MSR is virtualized and therefore unaffected by the erratum.
 
 * Fix 6.12 regression where CONFIG_KVM will be built as a module even
   if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
 
 x86 selftests:
 
 * x86 selftests can now use AVX.
 
 Documentation:
 
 * Use rST internal links
 
 * Reorganize the introduction to the API document
 
 Generic:
 
 * Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter long
   due to having to wait for all CPUs become quiescent.  In general both reads
   and writes are rare, but userspace that supports confidential computing is
   introducing the use of "helper" vCPUs that may jump from one host processor
   to another.  Those will be very happy to trigger a synchronize_rcu(), and
   the effect on performance is quite the disaster.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
 dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
 letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
 1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
 t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
 ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
 =lWpE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The biggest change here is eliminating the awful idea that KVM had of
  essentially guessing which pfns are refcounted pages.

  The reason to do so was that KVM needs to map both non-refcounted
  pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
  VMAs that contain refcounted pages.

  However, the result was security issues in the past, and more recently
  the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
  struct page but is not refcounted. In particular this broke virtio-gpu
  blob resources (which directly map host graphics buffers into the
  guest as "vram" for the virtio-gpu device) with the amdgpu driver,
  because amdgpu allocates non-compound higher order pages and the tail
  pages could not be mapped into KVM.

  This requires adjusting all uses of struct page in the
  per-architecture code, to always work on the pfn whenever possible.
  The large series that did this, from David Stevens and Sean
  Christopherson, also cleaned up substantially the set of functions
  that provided arch code with the pfn for a host virtual addresses.

  The previous maze of twisty little passages, all different, is
  replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
  non-__ versions of these two, and kvm_prefetch_pages) saving almost
  200 lines of code.

  ARM:

   - Support for stage-1 permission indirection (FEAT_S1PIE) and
     permission overlays (FEAT_S1POE), including nested virt + the
     emulated page table walker

   - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
     call was introduced in PSCIv1.3 as a mechanism to request
     hibernation, similar to the S4 state in ACPI

   - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
     part of it, introduce trivial initialization of the host's MPAM
     context so KVM can use the corresponding traps

   - PMU support under nested virtualization, honoring the guest
     hypervisor's trap configuration and event filtering when running a
     nested guest

   - Fixes to vgic ITS serialization where stale device/interrupt table
     entries are not zeroed when the mapping is invalidated by the VM

   - Avoid emulated MMIO completion if userspace has requested
     synchronous external abort injection

   - Various fixes and cleanups affecting pKVM, vCPU initialization, and
     selftests

  LoongArch:

   - Add iocsr and mmio bus simulation in kernel.

   - Add in-kernel interrupt controller emulation.

   - Add support for virtualization extensions to the eiointc irqchip.

  PPC:

   - Drop lingering and utterly obsolete references to PPC970 KVM, which
     was removed 10 years ago.

   - Fix incorrect documentation references to non-existing ioctls

  RISC-V:

   - Accelerate KVM RISC-V when running as a guest

   - Perf support to collect KVM guest statistics from host side

  s390:

   - New selftests: more ucontrol selftests and CPU model sanity checks

   - Support for the gen17 CPU model

   - List registers supported by KVM_GET/SET_ONE_REG in the
     documentation

  x86:

   - Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
     improve documentation, harden against unexpected changes.

     Even if the hardware A/D tracking is disabled, it is possible to
     use the hardware-defined A/D bits to track if a PFN is Accessed
     and/or Dirty, and that removes a lot of special cases.

   - Elide TLB flushes when aging secondary PTEs, as has been done in
     x86's primary MMU for over 10 years.

   - Recover huge pages in-place in the TDP MMU when dirty page logging
     is toggled off, instead of zapping them and waiting until the page
     is re-accessed to create a huge mapping. This reduces vCPU jitter.

   - Batch TLB flushes when dirty page logging is toggled off. This
     reduces the time it takes to disable dirty logging by ~3x.

   - Remove the shrinker that was (poorly) attempting to reclaim shadow
     page tables in low-memory situations.

   - Clean up and optimize KVM's handling of writes to
     MSR_IA32_APICBASE.

   - Advertise CPUIDs for new instructions in Clearwater Forest

   - Quirk KVM's misguided behavior of initialized certain feature MSRs
     to their maximum supported feature set, which can result in KVM
     creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
     a non-zero value results in the vCPU having invalid state if
     userspace hides PDCM from the guest, which in turn can lead to
     save/restore failures.

   - Fix KVM's handling of non-canonical checks for vCPUs that support
     LA57 to better follow the "architecture", in quotes because the
     actual behavior is poorly documented. E.g. most MSR writes and
     descriptor table loads ignore CR4.LA57 and operate purely on
     whether the CPU supports LA57.

   - Bypass the register cache when querying CPL from kvm_sched_out(),
     as filling the cache from IRQ context is generally unsafe; harden
     the cache accessors to try to prevent similar issues from occuring
     in the future. The issue that triggered this change was already
     fixed in 6.12, but was still kinda latent.

   - Advertise AMD_IBPB_RET to userspace, and fix a related bug where
     KVM over-advertises SPEC_CTRL when trying to support cross-vendor
     VMs.

   - Minor cleanups

   - Switch hugepage recovery thread to use vhost_task.

     These kthreads can consume significant amounts of CPU time on
     behalf of a VM or in response to how the VM behaves (for example
     how it accesses its memory); therefore KVM tried to place the
     thread in the VM's cgroups and charge the CPU time consumed by that
     work to the VM's container.

     However the kthreads did not process SIGSTOP/SIGCONT, and therefore
     cgroups which had KVM instances inside could not complete freezing.

     Fix this by replacing the kthread with a PF_USER_WORKER thread, via
     the vhost_task abstraction. Another 100+ lines removed, with
     generally better behavior too like having these threads properly
     parented in the process tree.

   - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
     didn't really work; there was really nothing to work around anyway:
     the broken patch was meant to fix nested virtualization, but the
     PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
     erratum.

   - Fix 6.12 regression where CONFIG_KVM will be built as a module even
     if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
     'y'.

  x86 selftests:

   - x86 selftests can now use AVX.

  Documentation:

   - Use rST internal links

   - Reorganize the introduction to the API document

  Generic:

   - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
     instead of RCU, so that running a vCPU on a different task doesn't
     encounter long due to having to wait for all CPUs become quiescent.

     In general both reads and writes are rare, but userspace that
     supports confidential computing is introducing the use of "helper"
     vCPUs that may jump from one host processor to another. Those will
     be very happy to trigger a synchronize_rcu(), and the effect on
     performance is quite the disaster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
  KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
  KVM: x86: add back X86_LOCAL_APIC dependency
  Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
  KVM: x86: switch hugepage recovery thread to vhost_task
  KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
  x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
  Documentation: KVM: fix malformed table
  irqchip/loongson-eiointc: Add virt extension support
  LoongArch: KVM: Add irqfd support
  LoongArch: KVM: Add PCHPIC user mode read and write functions
  LoongArch: KVM: Add PCHPIC read and write functions
  LoongArch: KVM: Add PCHPIC device support
  LoongArch: KVM: Add EIOINTC user mode read and write functions
  LoongArch: KVM: Add EIOINTC read and write functions
  LoongArch: KVM: Add EIOINTC device support
  LoongArch: KVM: Add IPI user mode read and write function
  LoongArch: KVM: Add IPI read and write function
  LoongArch: KVM: Add IPI device support
  LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
  KVM: arm64: Pass on SVE mapping failures
  ...
2024-11-23 16:00:50 -08:00
James Morse
83732ce6a0 arm64/sysreg: Convert existing MPAM sysregs and add the remaining entries
Move the existing MPAM system register defines from sysreg.h to
tools/sysreg and add the remaining system registers.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-2-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31 18:09:38 +00:00
Mark Brown
ce0641d48d arm64/sysreg: Add definitions for architected GCS caps
The architecture defines a format for guarded control stack caps, used
to mark the top of an unused GCS in order to limit the potential for
exploitation via stack switching. Add definitions associated with these.

Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-8-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04 12:04:34 +01:00
Linus Torvalds
64dd3b6a79 ARM:
* New Stage-2 page table dumper, reusing the main ptdump infrastructure
 
 * FP8 support
 
 * Nested virtualization now supports the address translation (FEAT_ATS1A)
   family of instructions
 
 * Add selftest checks for a bunch of timer emulation corner cases
 
 * Fix multiple cases where KVM/arm64 doesn't correctly handle the guest
   trying to use a GICv3 that wasn't advertised
 
 * Remove REG_HIDDEN_USER from the sysreg infrastructure, making
   things little simpler
 
 * Prevent MTE tags being restored by userspace if we are actively
   logging writes, as that's a recipe for disaster
 
 * Correct the refcount on a page that is not considered for MTE tag
   copying (such as a device)
 
 * When walking a page table to split block mappings, synchronize only
   at the end the walk rather than on every store
 
 * Fix boundary check when transfering memory using FFA
 
 * Fix pKVM TLB invalidation, only affecting currently out of tree
   code but worth addressing for peace of mind
 
 LoongArch:
 
 * Revert qspinlock to test-and-set simple lock on VM.
 
 * Add Loongson Binary Translation extension support.
 
 * Add PMU support for guest.
 
 * Enable paravirt feature control from VMM.
 
 * Implement function kvm_para_has_feature().
 
 RISC-V:
 
 * Fix sbiret init before forwarding to userspace
 
 * Don't zero-out PMU snapshot area before freeing data
 
 * Allow legacy PMU access from guest
 
 * Fix to allow hpmcounter31 from the guest
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmbmghAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPFQgf+Ijeqlx90BGy96pyzo/NkYKPeEc8G
 gKhlm8PdtdZYaRdJ53MVRLLpzbLuzqbwrn0ZX2tvoDRLzuAqTt2GTFoT6e2HtY5B
 Sf7KQMFwHWGtGklC1EmZ1fXsCocswpuAcexCLKLRBoWUcKABlgwV3N3vJo5gx/Ag
 8XXhYpcLTh+p7bjMdJShQy019pTwEDE68pPVnL2NPzla1G6Qox7ZJIdOEMZXuyJA
 MJ4jbFWE/T8vLFUf/8MGQ/+bo+4140kzB8N9wkazNcBRoodY6Hx+Lm1LiZjNudO1
 ilIdB4P3Ht+D8UuBv2DO5XTakfJz9T9YsoRcPlwrOWi/8xBRbt236gFB3Q==
 =sHTI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "These are the non-x86 changes (mostly ARM, as is usually the case).
  The generic and x86 changes will come later"

  ARM:

   - New Stage-2 page table dumper, reusing the main ptdump
     infrastructure

   - FP8 support

   - Nested virtualization now supports the address translation
     (FEAT_ATS1A) family of instructions

   - Add selftest checks for a bunch of timer emulation corner cases

   - Fix multiple cases where KVM/arm64 doesn't correctly handle the
     guest trying to use a GICv3 that wasn't advertised

   - Remove REG_HIDDEN_USER from the sysreg infrastructure, making
     things little simpler

   - Prevent MTE tags being restored by userspace if we are actively
     logging writes, as that's a recipe for disaster

   - Correct the refcount on a page that is not considered for MTE tag
     copying (such as a device)

   - When walking a page table to split block mappings, synchronize only
     at the end the walk rather than on every store

   - Fix boundary check when transfering memory using FFA

   - Fix pKVM TLB invalidation, only affecting currently out of tree
     code but worth addressing for peace of mind

  LoongArch:

   - Revert qspinlock to test-and-set simple lock on VM.

   - Add Loongson Binary Translation extension support.

   - Add PMU support for guest.

   - Enable paravirt feature control from VMM.

   - Implement function kvm_para_has_feature().

  RISC-V:

   - Fix sbiret init before forwarding to userspace

   - Don't zero-out PMU snapshot area before freeing data

   - Allow legacy PMU access from guest

   - Fix to allow hpmcounter31 from the guest"

* tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (64 commits)
  LoongArch: KVM: Implement function kvm_para_has_feature()
  LoongArch: KVM: Enable paravirt feature control from VMM
  LoongArch: KVM: Add PMU support for guest
  KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier
  KVM: arm64: Simplify visibility handling of AArch32 SPSR_*
  KVM: arm64: Simplify handling of CNTKCTL_EL12
  LoongArch: KVM: Add vm migration support for LBT registers
  LoongArch: KVM: Add Binary Translation extension support
  LoongArch: KVM: Add VM feature detection function
  LoongArch: Revert qspinlock to test-and-set simple lock on VM
  KVM: arm64: Register ptdump with debugfs on guest creation
  arm64: ptdump: Don't override the level when operating on the stage-2 tables
  arm64: ptdump: Use the ptdump description from a local context
  arm64: ptdump: Expose the attribute parsing functionality
  KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer
  KVM: arm64: Move pagetable definitions to common header
  KVM: arm64: nv: Add support for FEAT_ATS1A
  KVM: arm64: nv: Plumb handling of AT S1* traps from EL2
  KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3
  KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration
  ...
2024-09-16 07:38:18 +02:00
Will Deacon
982a847c71 Merge branch 'for-next/poe' into for-next/core
* for-next/poe: (31 commits)
  arm64: pkeys: remove redundant WARN
  kselftest/arm64: Add test case for POR_EL0 signal frame records
  kselftest/arm64: parse POE_MAGIC in a signal frame
  kselftest/arm64: add HWCAP test for FEAT_S1POE
  selftests: mm: make protection_keys test work on arm64
  selftests: mm: move fpregs printing
  kselftest/arm64: move get_header()
  arm64: add Permission Overlay Extension Kconfig
  arm64: enable PKEY support for CPUs with S1POE
  arm64: enable POE and PIE to coexist
  arm64/ptrace: add support for FEAT_POE
  arm64: add POE signal support
  arm64: implement PKEYS support
  arm64: add pte_access_permitted_no_overlay()
  arm64: handle PKEY/POE faults
  arm64: mask out POIndex when modifying a PTE
  arm64: convert protection key into vm_flags and pgprot values
  arm64: add POIndex defines
  arm64: re-order MTE VM_ flags
  arm64: enable the Permission Overlay Extension for EL0
  ...
2024-09-12 13:43:41 +01:00