linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-03 20:14:06 +02:00

Author	SHA1	Message	Date
Mark Brown	8372633074	KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2 (which traps) or ZCR_EL1 (which does not trap). KVM handles both in different way: - on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own VL limit. This has the potential to break existing SW that relies on the full LEN field to be stateful. - on ZCR_EL1 access, we do absolutely nothing. On restoring the SVE context for an L2 guest, we directly restore the guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the guest's view of the register was updated using the ZCR_EL2 accessor, the value has already been sanitised (with the caveat mentioned above). But if the guest used ZCR_EL1, the raw value is written into the HW, and the L2 guest can now access VLs that it shouldn't. Fix all the above by moving the VL capping to the restore points, ensuring that: - the HW is always programmed with a capped value, irrespective of the accessor being used, - the ZCR_EL2.LEN field is always completely stateful, irrespective of the accessor being used. Additionally, move ZCR_EL2 to be a sanitised register, ensuring that only the LEN field is actually stateful. This requires some creative construction of the RES0 mask, as the sysreg generation script does not yet generate RAZ/WI fields. Fixes: `b3d29a8230` ("KVM: arm64: nv: Handle ZCR_EL2 traps") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260529-kvm-arm64-fix-zcr-len-nv-v2-1-86cad51992bd@kernel.org [maz: rewrote commit message, tidy up access_zcr_el2()] Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-29 10:04:00 +01:00
Linus Torvalds	01f492e181	Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported. This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor. - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn. LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel(). - Add DMSINTC irqchip in kernel support. RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors. - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code. - Fix LPSW/E to update the bear (which of course is the breaking event address register). x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized. - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes. - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage. x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace. - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier"). - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O. - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times. - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM. - Exempt SMM from CPUID faulting on Intel, as per the spec. - Misc hardening and cleanup changes. x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs. - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs. - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input. - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION. - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault. - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl. - Convert a pile of kvm->lock SEV code to guard(). - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6. - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2. - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE. - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore. - Add a variety of missing nSVM consistency checks. - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT. - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions. - Add support for save+restore of virtualized LBRs (on SVM). - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain. - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features. - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests. - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses). - Cache all used vmcb12 fields to further harden against TOCTOU bugs. x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros. - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate. - Code cleanups. guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to. LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests. x86 selftests: - Add support for Hygon CPUs in KVM selftests. - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP. - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnftRQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAzwf+NKO4Ktv+7A22ImN0SBl0nlUuulsz vTcw3+hxdRoIw83GdNS+hG5js0wrpMDnbv3t4+VliDNBSSxrBzcSWX2wpilW0Xtw qGo1MWhs2lKPy1NlaRVOwPS6j7uF3AR0TQ1iQLGMedQuCU9WpiKJxyhNXJdbLrt3 8EgFzsvtEsv+jKNRUNDf9+d0j4gZsFyIe+Brhianbw+u3/UCiUClLCdsKPc4+5ZX 08otYXytacGNIf/5Ev1vT4pHkHL0yqKXAtX7LEtaS3+0KrPuLjV4slemivzE9vf5 Evafm5AhA4wpaNMb1ZerhY3T94lsMaJpWxotjR//0Q7C9B59pCQnXCm8mg== =CcE0 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Catalin Marinas	480a9e57cc	Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message	2026-04-10 14:22:24 +01:00
Marc Zyngier	76efe94b1c	KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1 The writeable mask for fields in ID_AA64PFR2_EL1 has been accidentally inverted, which isn't a very good idea. Restore the expected polarity. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `a258a383b9` ("KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	d82d09d5ba	KVM: arm64: Don't skip per-vcpu NV initialisation Some GICv5-related rework have resulted in the NV sanitisation of registers being skipped for secondary vcpus, which is a pretty bad idea. Hoist the NV init early so that it is always executed. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `cbd8c958be` ("KVM: arm64: Return early from kvm_finalize_sys_regs() if guest has run") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:25 +01:00
Ben Horgan	2e7c684bdb	KVM: arm64: Make MPAMSM_EL1 accesses UNDEF The MPAMSM_EL1 register controls the MPAM labeling for an SMCU, Streaming Mode Compute Unit. As there is no MPAM support in KVM, make sure MPAMSM_EL1 accesses trigger an UNDEF. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:27:47 +00:00
Yeoreum Yun	f6bff18d05	KVM: arm64: Expose FEAT_LSUI to guests Expose the FEAT_LSUI ID register field to guests. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-03-26 18:19:28 +00:00
Sascha Bischoff	d1328c6151	KVM: arm64: gic-v5: Trap and mask guest ICC_PPI_ENABLERx_EL1 writes A guest should not be able to detect if a PPI that is not exposed to the guest is implemented or not. Avoid the guest enabling any PPIs that are not implemented as far as the guest is concerned by trapping and masking writes to the two ICC_PPI_ENABLERx_EL1 registers. When a guest writes these registers, the write is masked with the set of PPIs actually exposed to the guest, and the state is written back to KVM's shadow state. As there is now no way for the guest to change the PPI enable state without it being trapped, saving of the PPI Enable state is dropped from guest exit. Reads for the above registers are not masked. When the guest is running and reads from the above registers, it is presented with what KVM provides in the ICH_PPI_ENABLERx_EL2 registers, which is the masked version of what the guest last wrote. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-25-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	070543a85a	KVM: arm64: gic-v5: Trap and emulate ICC_IDR0_EL1 accesses Unless accesses to the ICC_IDR0_EL1 are trapped by KVM, the guest reads the same state as the host. This isn't desirable as it limits the migratability of VMs and means that KVM can't hide hardware features such as FEAT_GCIE_LEGACY. Trap and emulate accesses to the register, and present KVM's chosen ID bits and Priority bits (which is 5, as GICv5 only supports 5 bits of priority in the CPU interface). FEAT_GCIE_LEGACY is never presented to the guest as it is only relevant for nested guests doing mixed GICv5 and GICv3 support. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-16-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	607871ce63	KVM: arm64: gic-v5: Add emulation for ICC_IAFFIDR_EL1 accesses GICv5 doesn't provide an ICV_IAFFIDR_EL1 or ICH_IAFFIDR_EL2 for providing the IAFFID to the guest. A guest access to the ICC_IAFFIDR_EL1 must therefore be trapped and emulated to avoid the guest accessing the host's ICC_IAFFIDR_EL1. The virtual IAFFID is provided to the guest when it reads ICC_IAFFIDR_EL1 (which always traps back to the hypervisor). Writes are rightly ignored. KVM treats the GICv5 VPEID, the virtual IAFFID, and the vcpu_id as the same, and so the vcpu_id is returned. The trapping for the ICC_IAFFIDR_EL1 is always enabled when in a guest context. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-15-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	9d6d9514c0	KVM: arm64: gic-v5: Support GICv5 FGTs & FGUs Extend the existing FGT/FGU infrastructure to include the GICv5 trap registers (ICH_HFGRTR_EL2, ICH_HFGWTR_EL2, ICH_HFGITR_EL2). This involves mapping the trap registers and their bits to the corresponding feature that introduces them (FEAT_GCIE for all, in this case), and mapping each trap bit to the system register/instruction controlled by it. As of this change, none of the GICv5 instructions or register accesses are being trapped. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-14-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	a258a383b9	KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE Add in a sanitization function for ID_AA64PFR2_EL1, preserving the already-present behaviour for the FPMR, MTEFAR, and MTESTOREONLY fields. Add sanitisation for the GCIE field, which is set to IMP if the host supports a GICv5 guest and NI, otherwise. Extend the sanitisation that takes place in kvm_vgic_create() to zero the ID_AA64PFR2.GCIE field when a non-GICv5 GIC is created. More importantly, move this sanitisation to a separate function, kvm_vgic_finalize_sysregs(), and call it from kvm_finalize_sys_regs(). We are required to finalize the GIC and GCIE fields a second time in kvm_finalize_sys_regs() due to how QEMU blindly reads out then verbatim restores the system register state. This avoids the issue where both the GCIE and GIC features are marked as present (an architecturally invalid combination), and hence guests fall over. See the comment in kvm_finalize_sys_regs() for more details. Overall, the following happens: * Before an irqchip is created, FEAT_GCIE is presented if the host supports GICv5-based guests. * Once an irqchip is created, all other supported irqchips are hidden from the guest; system register state reflects the guest's irqchip. * Userspace is allowed to set invalid irqchip feature combinations in the system registers, but... * ...invalid combinations are removed a second time prior to the first run of the guest, and things hopefully just work. All of this extra work is required to make sure that "legacy" GICv3 guests based on QEMU transparently work on compatible GICv5 hosts without modification. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-13-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	cbd8c958be	KVM: arm64: Return early from kvm_finalize_sys_regs() if guest has run If the guest has already run, we have no business finalizing the system register state - it is too late. Therefore, check early and bail if the VM has already run. This change also stops kvm_init_nv_sysregs() from being called once the RM has run once. Although this looks like a behavioural change, the function returns early once it has been called the first time. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-4-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 16:25:55 +00:00
Sascha Bischoff	3a2857da94	KVM: arm64: vgic: Rework vgic_is_v3() and add vgic_host_has_gicvX() The GIC version checks used to determine host capabilities and guest configuration have become somewhat conflated (in part due to the addition of GICv5 support). vgic_is_v3() is a prime example, which prior to this change has been a combination of guest configuration and host cabability. Split out the host capability check from vgic_is_v3(), which now only checks if the vgic model itself is GICv3. Add two new functions: vgic_host_has_gicv3() and vgic_host_has_gicv5(). These explicitly check the host capabilities, i.e., can the host system run a GICvX guest or not. The vgic_is_v3() check in vcpu_set_ich_hcr() has been replaced with vgic_host_has_gicv3() as this only applies on GICv3-capable hardware, and isn't strictly only applicable for a GICv3 guest (it is actually vital for vGICv2 on GICv3 hosts). Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-3-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 16:25:55 +00:00
Sascha Bischoff	90f0155f87	KVM: arm64: vgic-v3: Drop userspace write sanitization for ID_AA64PFR0.GIC on GICv5 Drop a check that blocked userspace writes to ID_AA64PFR0_EL1 for writes that set the GIC field to 0 (NI) on GICv5 hosts. There is no such check for GICv3 native systems, and having inconsistent behaviour both complicates the logic and risks breaking existing userspace software that expects to be able to write the register. This means that userspace is now able to create a GICv3 guest on GICv5 hosts, and disable the guest from seeing that it has a GICv3. This matches the already existing behaviour for GICv3-native VMs, allowing for fewer issues when migrating from GICv3 hosts to compatible GICv5 hosts. Additionally, this allows the trap and FGU infrastucture to kick in as these rely on the state of the feature bits that have been set. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-2-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 16:25:55 +00:00
Fuad Tabba	f66857bafd	KVM: arm64: Hide S1POE from guests when not supported by the host When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1. However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to guests whenever the hardware supports it, ignoring the host kernel configuration. If a guest detects this feature and attempts to use it, the host will fail to context-switch POR_EL1, potentially leading to state corruption. Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system registers, preventing KVM from advertising the feature when the host does not support it (i.e. system_supports_poe() is false). Fixes: `70ed723829` ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-13 14:54:48 +00:00
Marc Zyngier	1df3f01ebf	Merge branch kvm-arm64/resx into kvmarm-master/next * kvm-arm64/resx: : . : Add infrastructure to deal with the full gamut of RESx bits : for NV. As a result, it is now possible to have the expected : semantics for some bits such as SCTLR_EL2.SPAN. : . KVM: arm64: Add debugfs file dumping computed RESx values KVM: arm64: Add sanitisation to SCTLR_EL2 KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE KVM: arm64: Remove all traces of FEAT_TME KVM: arm64: Simplify handling of full register invalid constraint KVM: arm64: Get rid of FIXED_VALUE altogether KVM: arm64: Simplify handling of HCR_EL2.E2H RESx KVM: arm64: Move RESx into individual register descriptors KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags KVM: arm64: Simplify FIXED_VALUE handling KVM: arm64: Convert HCR_EL2.RW to AS_RES1 KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features KVM: arm64: Allow RES1 bits to be inferred from configuration KVM: arm64: Inherit RESx bits from FGT register descriptors KVM: arm64: Extend unified RESx handling to runtime sanitisation KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits KVM: arm64: Introduce standalone FGU computing primitive KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E} arm64: Convert SCTLR_EL2 to sysreg infrastructure Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:17:48 +00:00
Marc Zyngier	3ef5ba663a	Merge branch kvm-arm64/debugfs-fixes into kvmarm-master/next * kvm-arm64/debugfs-fixes: : . : Cleanup of the debugfs iterator, which are way more complicated : than they ought to be, courtesy of Fuad Tabba. From the cover letter: : : "This series refactors the debugfs implementations for `idregs` and : `vgic-state` to use standard `seq_file` iterator patterns. : : The existing implementations relied on storing iterator state within : global VM structures (`kvm_arch` and `vgic_dist`). This approach : prevented concurrent reads of the debugfs files (returning -EBUSY) and : created improper dependencies between transient file operations and : long-lived VM state." : . KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs KVM: arm64: Reimplement vgic-debug XArray iteration KVM: arm64: Use standard seq_file iterator for idregs debugfs Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:17:44 +00:00
Marc Zyngier	edba407843	KVM: arm64: Add debugfs file dumping computed RESx values Computing RESx values is hard. Verifying that they are correct is harder. Add a debugfs file called "resx" that will dump all the RESx values for a given VM. I found it useful, maybe you will too. Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-21-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:02:13 +00:00
Fuad Tabba	dcd79ed450	KVM: arm64: Use standard seq_file iterator for idregs debugfs The current implementation uses `idreg_debugfs_iter` in `struct kvm_arch` to track the sequence position. This effectively makes the iterator shared across all open file descriptors for the VM. This approach has significant drawbacks: - It enforces mutual exclusion, preventing concurrent reads of the debugfs file (returning -EBUSY). - It relies on storing transient iterator state in the long-lived VM structure (`kvm_arch`). - The use of `u8` for the iterator index imposes an implicit limit of 255 registers. While not currently exceeded, this is fragile against future architectural growth. Switching to `loff_t` eliminates this overflow risk. Refactor the implementation to use the standard `seq_file` iterator. Instead of storing state in `kvm_arch`, rely on the `pos` argument passed to the `start` and `next` callbacks, which tracks the logical index specific to the file descriptor. This change enables concurrent access and eliminates the `idreg_debugfs_iter` field from `struct kvm_arch`. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202085721.3954942-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-02 10:59:25 +00:00
Marc Zyngier	cb6cd8a86d	Merge branch kvm-arm64/feat_idst into kvmarm-master/next * kvm-arm64/feat_idst: : . : Add support for FEAT_IDST, allowing ID registers that are not implemented : to be reported as a normal trap rather than as an UNDEF exception. : . KVM: arm64: selftests: Add a test for FEAT_IDST KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome KVM: arm64: pkvm: Add a generic synchronous exception injection primitive KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers KVM: arm64: Add a generic synchronous exception injection primitive KVM: arm64: Add trap routing for GMID_EL1 arm64: Repaint ID_AA64MMFR2_EL1.IDS description Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-23 10:04:35 +00:00
Marc Zyngier	70a5ce4efc	KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE If our host has MTE, but the guest doesn't, make sure we set HCR_EL2.TID5 to force GMID_EL1 being trapped. Such trap will be handled by the FEAT_IDST handling. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Link: https://patch.msgid.link/20260108173233.2911955-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 11:58:57 +00:00
Marc Zyngier	f07ef1bef6	KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way Now that we can handle ID registers using the FEAT_IDST infrastrcuture, get rid of the handling of CSSIDR2_EL1 and SMIDR_EL1. Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Link: https://patch.msgid.link/20260108173233.2911955-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 11:58:57 +00:00
Alexandru Elisei	aba963cb98	KVM: arm64: Inject UNDEF for a register trap without accessor Configuring a register trap without specifying an accessor function is abviously a bug. Instead of calling die() when that happens, let's be a bit more helpful and print the register encoding. Also inject an undefined instruction exception in the guest, similar to other unhandled register accesses. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251216103053.47224-3-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>	2026-01-08 12:56:17 -08:00
Paolo Bonzini	f58e70cc31	KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests -----BEGIN PGP SIGNATURE----- iIgEABYKADAWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaS3m5RIcb3VwdG9uQGtl cm5lbC5vcmcACgkQor51iCR83Rb4NAD8C1fGoiCErb6htQMHf1I7ua0ThdIx7OnY Mk1EysNWu94BAI/VKEYgz+UC5uapHh+gnsoOdVTMJZedI/OPrnKa3QIA =/Vl1 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests	2025-12-02 18:36:26 +01:00
Oliver Upton	3eef0c83c3	Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next * kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-12-01 00:47:41 -08:00
Oliver Upton	92c6443222	KVM: arm64: Propagate PTW errors up to AT emulation KVM's software PTW will soon support 'hardware' updates to the access flag. Similar to fault handling, races to update the descriptor will be handled by restarting the instruction. Prepare for this by propagating errors up to the AT emulation, only retiring the instruction if the walk succeeds. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-12-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-12-01 00:44:02 -08:00
Marc Zyngier	cd4f6ee99b	KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps Deactivation via ICV_DIR_EL1 is both relatively straightforward (we have the interrupt that needs deactivation) and really awkward. The main issue is that the interrupt may either be in an LR on another CPU, or ourside of any LR. In the former case, we process the deactivation is if ot was a write to GICD_CACTIVERn, which is already implemented as a big hammer IPI'ing all vcpus. In the latter case, we just perform a normal deactivation, similar to what we do for EOImode==0. Another annoying aspect is that we need to tell the CPU owning the interrupt that its ap_list needs laudering. We use a brand new vcpu request to that effect. Note that this doesn't address deactivation via the GICV MMIO view, which will be taken care of in a later change. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:13 -08:00
Marc Zyngier	0f559cd91e	KVM: arm64: Finalize ID registers only once per VM Owing to the ID registers being global to the VM, there is no point in computing them more than once. However, recent changes making use of kvm_set_vm_id_reg() outlined that we repeatedly hammer the ID registers when we shouldn't. Gate the ID reg update on the VM having never run. Fixes: `50e7cce81b` ("KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip") Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Closes: https://lore.kernel.org/r/aRHf6x5umkTYhYJ3@finisterre.sirena.org.uk Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20251110173010.1918424-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-11 12:24:22 +00:00
Marc Zyngier	50e7cce81b	KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip Now that the idreg's GIC field is in sync with the irqchip, limit the runtime clearing of these fields to the pathological case where we do not have an in-kernel GIC. While we're at it, use the existing API instead of open-coded accessors to access the ID regs. Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251030122707.2033690-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:17:28 +00:00
Marc Zyngier	3f9eacf4f0	KVM: arm64: Make all 32bit ID registers fully writable 32bit ID registers aren't getting much love these days, and are often missed in updates. One of these updates broke restoring a GICv2 guest on a GICv3 machine. Instead of performing a piecemeal fix, just bite the bullet and make all 32bit ID regs fully writable. KVM itself never relies on them for anything, and if the VMM wants to mess up the guest, so be it. Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251030122707.2033690-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:17:28 +00:00
Marc Zyngier	c3be3a48fb	KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure Moving the counter registers is a bit more involved than for the control and comparator (there is no shadow data for the counter), but still pretty manageable. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	8af198980e	KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure As for the control registers, move the comparator registers to the common infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	09424d5d7d	KVM: arm64: Move CNT_CTL_EL0 userspace accessors to generic infrastructure Remove the handling of CNT_CTL_EL0 from guest.c, and move it to sys_regs.c, using a new TIMER_REG() definition to encapsulate it. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	77a0c42eaf	KVM: arm64: Add timer UAPI workaround to sysreg infrastructure Amongst the numerous bugs that plague the KVM/arm64 UAPI, one of the most annoying thing is that the userspace view of the virtual timer has its CVAL and CNT encodings swapped. In order to reduce the amount of code that has to know about this, start by adding handling for this bug in the sys_reg code. Nothing is making use of it yet, as the code responsible for userspace interaction is catching the accesses early. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	4cab5c857d	KVM: arm64: Hide CNTHV__EL2 from userspace for nVHE guests Although we correctly UNDEF any CNTHV__EL2 access from the guest when E2H==0, we still expose these registers to userspace, which is a bad idea. Drop the ad-hoc UNDEF injection and switch to a .visibility() callback which will also hide the register from userspace. Fixes: `0e45981028` ("KVM: arm64: timer: Don't adjust the EL2 virtual timer offset") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	9a1950f977	KVM: arm64: nv: Don't advance PC when pending an SVE exception Jan reports that running a nested guest on Neoverse-V2 leads to a WARN in the host due to simultaneously pending an exception and PC increment after an access to ZCR_EL2. Returning true from a sysreg accessor is an indication that the sysreg instruction has been retired. Of course this isn't the case when we've pended a synchronous SVE exception for the guest. Fix the return value and let the exception propagate to the guest as usual. Reported-by: Jan Kotas <jank@cadence.com> Closes: https://lore.kernel.org/kvmarm/865xd61tt5.wl-maz@kernel.org/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	ed25dcfbc4	KVM: arm64: nv: Don't treat ZCR_EL2 as a 'mapped' register Unlike the other mapped EL2 sysregs ZCR_EL2 isn't guaranteed to be resident when a vCPU is loaded as it actually follows the SVE context. As such, the contents of ZCR_EL1 may belong to another guest if the vCPU has been preempted before reaching sysreg emulation. Unconditionally use the in-memory value of ZCR_EL2 and switch to the memory-only accessors. The in-memory value is guaranteed to be valid as fpsimd_lazy_switch_to_{guest,host}() will restore/save the register appropriately. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:02 +01:00
Marc Zyngier	181ce6b01a	Merge branch kvm-arm64/misc-6.18 into kvmarm-master/next * kvm-arm64/misc-6.18: : . : . : Misc improvements and bug fixes: : : - Fix XN handling in the S2 page table dumper : (20250809135356.1003520-1-r09922117@csie.ntu.edu.tw) : : - Fix sanitity checks for huge mapping with pKVM running np guests : (20250815162655.121108-1-ben.horgan@arm.com) : : - Fix use of TRBE when KVM is disabled, and Linux running under : a lesser hypervisor (20250902-etm_crash-v2-1-aa9713a7306b@oss.qualcomm.com) : : - Fix out of date MTE-related comments (20250915155234.196288-1-alexandru.elisei@arm.com) : : - Fix PSCI BE support when running a NV guest (20250916161103.1040727-1-maz@kernel.org) : : - Fix page reference leak when refusing to map a page due to mismatched attributes : (20250917130737.2139403-1-tabba@google.com) : : - Add trap handling for PMSDSFR_EL1 : (20250901-james-perf-feat_spe_eft-v8-7-2e2738f24559@linaro.org) : : - Add advertisement from FEAT_LSFE (Large System Float Extension) : (20250918-arm64-lsfe-v4-1-0abc712101c7@kernel.org) : . KVM: arm64: Expose FEAT_LSFE to guests KVM: arm64: Add trap configs for PMSDSFR_EL1 KVM: arm64: Fix page leak in user_mem_abort() KVM: arm64: Fix kvm_vcpu_{set,is}_be() to deal with EL2 state KVM: arm64: Update stale comment for sanitise_mte_tags() KVM: arm64: Return early from trace helpers when KVM isn't available KVM: arm64: Fix debug checking for np-guests using huge mappings KVM: arm64: ptdump: Don't test PTE_VALID alongside other attributes Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:29 +01:00
Marc Zyngier	47f15744fc	Merge branch kvm-arm64/nv-misc-6.18 into kvmarm-master/next * kvm-arm64/nv-misc-6.18: : . : Various NV-related fixes: : : - Relax KVM's SError injection to consider that HCR_EL2.AMO's : effective value is 1 when HCR_EL2.{E2H,TGE)=={1,0}. : (20250918164632.410404-1-oliver.upton@linux.dev) : : - Allow userspace to disable some S2 base granule sizes : (20250918165505.415017-1-oliver.upton@linux.dev) : . KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0} Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:24 +01:00
Mark Brown	5d20605c8e	KVM: arm64: Expose FEAT_LSFE to guests FEAT_LSFE (Large System Float Extension), providing atomic floating point memory operations, is optional from v9.5. This feature adds no new architectural state, expose the relevant ID register field to guests so they can discover it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 15:34:40 +01:00
James Clark	557c82a448	KVM: arm64: Add trap configs for PMSDSFR_EL1 SPE data source filtering (SPE_FEAT_FDS) adds a new register PMSDSFR_EL1, add the trap configs for it. PMSNEVFR_EL1 was also missing its VNCR offset so add it along with PMSDSFR_EL1. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:30:49 +01:00
Oliver Upton	49da9872a6	KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature fields where a non-negative value implies that a feature is implemented and a negative value implies that it is not. While the intention of masking this field was likely to hide the feature, KVM actually advertises it, even on unsupporting hardware. Remove FEAT_DoubleLock from the mask, making the NI value visible to the VM. Take care to accept the old, incorrect values for this field as we've lied to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Jinqian Yang	1a0b2bf6ff	KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace Allow userspace to downgrade {HCX, TWED} in ID_AA64MMFR1_EL1. Userspace can only change the value from high to low. Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 13:55:57 +01:00
Oliver Upton	5aea409638	KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs KVM advertises the stage-2 TGRAN fields as writable to userspace but prevents any modification for NV-enabled VMs. Update the special-cased sanitization to permit de-featuring a particular TGRAN without allowing the legacy value which refers to the stage-1 field for support. Reported-by: Itaru Kitayama <itaru.kitayama@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 10:52:01 +01:00
Paolo Bonzini	42a0305ab1	KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaLC0JBccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFogJAQCyxHd5tuvXWWT/iC2EYFlPWYkU LOQbNhus16QjQ9f2ggD8CoA+6UAxzYW7ZU6IzYkDhJkN/3dKQEQhh8Cx0GXXRAs= =uky+ -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes	2025-08-29 12:57:31 -04:00
Marc Zyngier	3328d17e70	KVM: arm64: Remove __vcpu_{read,write}_sys_reg_{from,to}_cpu() There is no point having __vcpu_{read,write}_sys_reg_{from,to}_cpu() exposed to the rest of the kernel, as the only callers are in sys_regs.c. Move them where they below, which is another opportunity to simplify things a bit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-28 11:39:48 -07:00
Marc Zyngier	ec0ab059d4	KVM: arm64: Fix vcpu_{read,write}_sys_reg() accessors Volodymyr reports (again!) that under some circumstances (E2H==0, walking S1 PTs), PAR_EL1 doesn't report the value of the latest walk in the CPU register, but that instead the value is written to the backing store. Further investigation indicates that the root cause of this is that a group of registers (PAR_EL1, TPIDR_EL{0,1}, the 32_EL2 dregs) should always be considered as "on CPU", as they are not remapped between EL1 and EL2. We fail to treat them accordingly, and end-up considering that the register (PAR_EL1 in this example) should be written to memory instead of in the register. While it would be possible to quickly work around it, it is obvious that the way we track these things at the moment is pretty horrible, and could do with some improvement. Revamp the whole thing by: - defining a location for a register (memory, cpu), potentially depending on the state of the vcpu - define a transformation for this register (mapped register, potential translation, special register needing some particular attention) - convey this information in a structure that can be easily passed around As a result, the accessors themselves become much simpler, as the state is explicit instead of being driven by hard-to-understand conventions. We get rid of the "pure EL2 register" notion, which wasn't very useful, and add sanitisation of the values by applying the RESx masks as required, something that was missing so far. And of course, we add the missing registers to the list, with the indication that they are always loaded. Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: `fedc612314` ("KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()") Link: https://lore.kernel.org/r/20250806141707.3479194-3-volodymyr_babchuk@epam.com Link: https://lore.kernel.org/r/20250817121926.217900-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-28 11:39:48 -07:00
Marc Zyngier	0843e0ced3	KVM: arm64: Get rid of ARM64_FEATURE_MASK() The ARM64_FEATURE_MASK() macro was a hack introduce whilst the automatic generation of sysreg encoding was introduced, and was too unreliable to be entirely trusted. We are in a better place now, and we could really do without this macro. Get rid of it altogether. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817202158.395078-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-21 16:31:56 -07:00
Marc Zyngier	7a765aa88e	KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable Allow userspace to write to RAS_frac, under the condition that the host supports RASv1p1 with RAS_frac==1. Other configurations will result in RAS_frac being exposed as 0, and therefore implicitly not writable. To avoid the clutter, the ID_AA64PFR1_EL1 sanitisation is moved to its own function. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-21 16:31:56 -07:00

1 2 3 4 5 ...

634 Commits