linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-29 17:43:52 +02:00

Author	SHA1	Message	Date
Vincent Donnefort	1702da76e0	KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc pKVM must validate the host-provided tracing buffer descriptor. However, if an error is found, the hypervisor would just return 0 to the host. Fix the return value on validation failure. While at it, rename the function to hyp_trace_desc_is_valid() and skip validation for the nVHE mode as we trust host-provided data in that case. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Fixes: `680a04c333` ("KVM: arm64: Add tracing capability for the nVHE/pKVM hyp") Link: https://lore.kernel.org/r/20260514162624.3477857-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-20 08:08:37 +01:00
Michael Bommarito	f19c354dbd	KVM: arm64: vgic: Free private_irqs when init fails after allocation Companion to commit `250f25367b` ("KVM: arm64: Tear down vGIC on failed vCPU creation"), which added the missing kvm_vgic_vcpu_destroy() call to the kvm_share_hyp() failure path in kvm_arch_vcpu_create(). The kvm_vgic_vcpu_init() failure path immediately above it has the same shape and still needs the same cleanup. Call kvm_vgic_vcpu_destroy() when kvm_vgic_vcpu_init() fails so private IRQs allocated before a redistributor iodev registration failure are released before the failed vCPU is freed. Fixes: `03b3d00a70` ("KVM: arm64: vgic: Allocate private interrupts on demand") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519135042.2219239-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-20 08:08:17 +01:00
Michael Bommarito	9ce754ed8e	KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits Userspace can restore an ITS Device Table Entry whose Size field encodes more EventID bits than the virtual ITS supports. The live MAPD path rejects that state, but vgic_its_restore_dte() accepts it and stores the out-of-range value in dev->num_eventid_bits. Reject restored DTEs with num_eventid_bits > VITS_TYPER_IDBITS before allocating the device. This mirrors the MAPD check and prevents the restored state from reaching vgic_its_restore_itt(), where the unchecked value can be converted into an oversized scan_its_table() range. Fixes: `57a9a11715` ("KVM: arm64: vgic-its: Device table save/restore") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519132519.2142458-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-05-20 08:08:11 +01:00
Fuad Tabba	effc0a39b8	KVM: arm64: Pre-check vcpu memcache for host->guest donate __pkvm_host_donate_guest() flips the host stage-2 PTE for the donated page to a non-valid annotation via host_stage2_set_owner_metadata_locked() and then calls kvm_pgtable_stage2_map() to install the matching guest stage-2 mapping. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM even at PAGE_SIZE granularity: the donate path verifies PKVM_NOPAGE for the guest IPA before the map, so the walker must allocate fresh page-table pages from the vcpu memcache, and the host controls the vcpu memcache via the topup interface. An under-provisioned donation request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation alongside the existing __host_check_page_state_range() / __guest_check_page_state_range() pre-checks, using the helper introduced for host->guest share. If the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(), return -ENOMEM before any state mutation. Fixes: `1e579adca1` ("KVM: arm64: Introduce __pkvm_host_donate_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:42 +01:00
Fuad Tabba	8234409ffb	KVM: arm64: Pre-check vcpu memcache for host->guest share __pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to install the guest stage-2 mapping, after a forward pass that mutates the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments host_share_guest_count) for every page in the range. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM when the stage-2 walker exhausts the caller's memcache, and the host controls the vcpu memcache via the topup interface, so an under-provisioned share request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation in the existing pre-check pass so that kvm_pgtable_stage2_map() cannot fail at the call site, using kvm_mmu_cache_min_pages() -- the same bound host EL1 uses for its own stage-2 maps. If the vcpu memcache holds fewer pages, return -ENOMEM before any state mutation. Fixes: `d0bd3e6570` ("KVM: arm64: Introduce __pkvm_host_share_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:42 +01:00
Fuad Tabba	5130d450d1	KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache The hypercall handlers call pkvm_refill_memcache() to top up the hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest(). pkvm_ownership_selftest invokes those functions directly with a static selftest_vcpu that has an empty memcache. Seed selftest_vcpu's memcache from the prepopulated selftest pages, leaving the remainder for selftest_vm.pool. Required by the memcache-sufficiency pre-check added in the following patches. Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Fuad Tabba	d4d215e5b8	KVM: arm64: Fix __deactivate_fgt macro parameter typo __deactivate_fgt() declares its first parameter as "htcxt" but the body references "hctxt". The parameter is unused; the macro silently captures "hctxt" from the enclosing scope. Both existing callers (__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen to define a local "struct kvm_cpu_context *hctxt", so the macro works by coincidence. A future caller without an "hctxt" local in scope, or naming it differently, would compile but bind to the wrong context. Align the parameter name with the sibling __activate_fgt() macro. The "vcpu" parameter remains unused in the body, kept for API symmetry with __activate_fgt() (which uses it). Fixes: `f5a5a406b4` ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Fuad Tabba	300fac4cc2	KVM: arm64: Guard against NULL vcpu on VHE hyp panic path On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu) on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer is cleared after every guest exit (and is never set when no guest is running), so an unexpected EL2 exception landing in _guest_exit_panic, e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this function with vcpu == NULL. __deactivate_traps() then dereferences vcpu via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv() -> vcpu->arch.features, faulting inside the panic handler and obscuring the original failure. The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c) already guards its vcpu-using cleanup with "if (vcpu)"; mirror that here. sysreg_restore_host_state_vhe() does not depend on vcpu and continues to run unconditionally, preserving panic forensics. The trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's %p handling. Fixes: `6a0259ed29` ("KVM: arm64: Remove hyp_panic arguments") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Mostafa Saleh	9a624ea3f2	KVM: arm64: Remove potential UB on nvhe tracing clock update Sashiko(locally) reports possiblity of division by zero and out-of-bounds bitwise shift in trace_clock_update(). Although the clock update is untrusted, we should at least have some basic checks to avoid undefined behaviours. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://patch.msgid.link/20260430103724.2151625-1-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:09:48 +01:00
Alexandru Elisei	9be19df816	KVM: arm64: Handle permission faults with guest_memfd gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It does this for both relaxing permissions on an existing mapping and to install a missing mapping. kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an existing, valid entry and the new entry modifies only the permissions. This is checked in: kvm_pgtable_stage2_map() stage2_map_walk_leaf() stage2_map_walker_try_leaf() stage2_pte_needs_update() and if only the permissions differ, kvm_pgtable_stage2_map() returns -EAGAIN and KVM returns to the guest to replay the instruction. The assumption is that a concurrent fault on a different VCPU already mapped the faulting IPA, and replaying the instruction will either succeed, or cause a permission fault, which should be handled with kvm_pgtable_stage2_relax_perms(). gmem_abort(), on a read or write fault on a system without DIC (instruction cache invalidation required for data to instruction coherence), installs a valid entry with read and write permissions, but without executable permissions. On an execution fault on the same page, gmem_abort() attempts to relax the permissions to allow execution, but calls kvm_pgtable_stage2_map() to change the existing, valid, entry. kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the faulting instruction, which leads to an infinite loop of permission faults on the same instruction. Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms() to relax permissions. Fixes: `a7b57e0995` ("KVM: arm64: Handle guest_memfd-backed guest page faults") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260505094913.75317-1-alexandru.elisei@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:08:39 +01:00
James Morse	1f7305d87a	KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests C1-Pro cores with SME have an erratum where TLBI+DSB does not complete all outstanding SME accesses. Instead a DSB needs to be executed on the affected CPUs. The implication is that pages cannot be unmapped from the host Stage 2 and then provided to a protected guest or to the hypervisor. Host SME accesses may still complete after this point. This erratum breaks pKVM's guarantees, and the workaround is hard to implement as EL2 and EL1 share a security state meaning EL1 can mask IPIs sent by EL2, leading to interrupt blackouts. Instead, do this in EL3. This has the advantage of a separate security state, meaning lower EL cannot mask the IPI. It is also simpler for EL3 to know about CPUs that are off or in PSCI's CPU_SUSPEND. Add the needed hook to host_stage2_set_owner_metadata_locked(). This covers the cases where the host loses access to a page: __pkvm_host_donate_guest() __pkvm_guest_unshare_host() host_stage2_set_owner_locked() when owner_id == PKVM_ID_HYP Since pKVM relies on the firmware call for correctness, check for the firmware counterpart during protected KVM initialisation and fail the pKVM initialisation if it is missing. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Sudeep Holla <sudeep.holla@kernel.org> Link: https://patch.msgid.link/20260505165205.2690919-1-catalin.marinas@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:08:39 +01:00
Paolo Bonzini	909eac682c	KVM/arm64 fixes for 7.1, take #1 - Allow tracing for non-pKVM, which was accidentally disabled when the series was merged - Rationalise the way the pKVM hypercall ranges are defined by using the same mechanism as already used for the vcpu_sysreg enum - Enforce that SMCCC function numbers relayed by the pKVM proxy are actually compliant with the specification - Fix a couple of feature to idreg mappings which resulted in the wrong sanitisation being applied - Fix the GICD_IIDR revision number field that could never been written correctly by userspace - Make kvm_vcpu_initialized() correctly use its parameter instead of relying on the surrounding context - Enforce correct ordering in __pkvm_init_vcpu(), plugging a potential pin leak at the same time - Move __pkvm_init_finalise() to a less dangerous spot, avoiding future problems - Restore functional userspace irqchip support after a four year breakage (last functional kernel was 5.18...). This is obviously ripe for garbage collection. - ... and the usual lot of spelling fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmnrhPoACgkQI9DQutE9 ekNyIxAAgXhyAJzOEvL22uk8bsCNh+mkV/33cI6uEdxJDNWl6yqcaiWqh9PMK0b6 JtV/TqNwr9ydqbmJPhlpoRA7tRmoOPXVI7tU0BvqYMdG1FXSqVlPK+DAw/GnOwYD 2vBz3I6Rwm1C5GAggcZNbU+DWXXpFnnILxSEd0N5HHmhPYp3q20jXMcKKfe7WRVn DDn2BIAGe65y1pWrG6f2TMxHAg4SghHy0CCA1+v0cfLyklseUlRVbAjaDO4x/2vT qJnjd5dDAzktarOiKFe141HrX4UE13Y3vvOlWDSog3iuACrr09HM8wVEh/49cz+5 55UKoldaQokTOHhe5p560gfzvsIfIjFPrWBkHJ1rke4ajE4Igg1FQirfl+CaZ3L3 h8b6gLqu8/i2e+Nj45AoDcvoxCuxTTwPIW/X/yJYBUMCfl5DIRj9SO5W7FHv3iLv Aa0ZdDb0rgvg7IW6kiFwlysNPvMAHpigkj4hCEonfP7dQTXjaxWybB8I3a4pOL5Q 2wSkAcqaYo+UgMXo5r4rbsEWgdrql4jxT9xcEMdv9pxpPck2CWVG1zdmgbHW1rk/ Pyh0qWbvdnxY9tDCZFxIoNhrynrcZUaoWJPScEU7lHb7T8+Gcb7ylnoJQjNu3K7z ZDS2QccLncILTPJabGcFm0a0DnmFfyqwqSo5iMeBtQDnwlKSLko= =p/JR -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #1 - Allow tracing for non-pKVM, which was accidentally disabled when the series was merged - Rationalise the way the pKVM hypercall ranges are defined by using the same mechanism as already used for the vcpu_sysreg enum - Enforce that SMCCC function numbers relayed by the pKVM proxy are actually compliant with the specification - Fix a couple of feature to idreg mappings which resulted in the wrong sanitisation being applied - Fix the GICD_IIDR revision number field that could never been written correctly by userspace - Make kvm_vcpu_initialized() correctly use its parameter instead of relying on the surrounding context - Enforce correct ordering in __pkvm_init_vcpu(), plugging a potential pin leak at the same time - Move __pkvm_init_finalise() to a less dangerous spot, avoiding future problems - Restore functional userspace irqchip support after a four year breakage (last functional kernel was 5.18...). This is obviously ripe for garbage collection. - ... and the usual lot of spelling fixes	2026-04-27 04:24:34 -04:00
Marc Zyngier	4ce98bf086	KVM: arm64: Wake-up from WFI when iqrchip is in userspace It appears that there is nothing in the wake-up path that evaluates whether the in-kernel interrupts are pending unless we have a vgic. This means that the userspace irqchip support has been broken for about four years, and nobody noticed. It was also broken before as we wouldn't wake-up on a PMU interrupt, but hey, who cares... It is probably time to remove the feature altogether, because it was a terrible idea 10 years ago, and it still is. Fixes: `b57de4ffd7` ("KVM: arm64: Simplify kvm_cpu_has_pending_timer()") Link: https://patch.msgid.link/20260423163607.486345-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Quentin Perret	5bb0aed57b	KVM: arm64: Fix initialisation order in __pkvm_init_finalise() fix_host_ownership() walks the hypervisor's stage-1 page-table to adjust the host's stage-2 accordingly. Any such adjustment that requires cache maintenance operations depends on the per-CPU hyp fixmap being present. However, fix_host_ownership() is currently called before fix_hyp_pgtable_refcnt() and hyp_create_fixmap(), so the fixmap does not yet exist when it runs. This is benign today because the host stage-2 starts empty and no CMOs are needed, but it becomes a latent crash as soon as fix_host_ownership() is extended to operate on a non-empty page-table. Reorder the calls so that fix_hyp_pgtable_refcnt() and hyp_create_fixmap() complete before fix_host_ownership() is invoked. Fixes: `0d16d12eb2` ("KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2") Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Fuad Tabba	73b9c1e5da	KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu() Two bugs exist in the vCPU initialisation path: 1. If a check fails after hyp_pin_shared_mem() succeeds, the cleanup path jumps to 'unlock' without calling unpin_host_vcpu() or unpin_host_sve_state(), permanently leaking pin references on the host vCPU and SVE state pages. Extract a register_hyp_vcpu() helper that performs the checks and the store. When register_hyp_vcpu() returns an error, call unpin_host_vcpu() and unpin_host_sve_state() inline before falling through to the existing 'unlock' label. 2. register_hyp_vcpu() publishes the new vCPU pointer into 'hyp_vm->vcpus[]' with a bare store, allowing a concurrent caller of pkvm_load_hyp_vcpu() to observe a partially initialised vCPU object. Ensure the store uses smp_store_release() and the load uses smp_load_acquire(). While 'vm_table_lock' currently serialises the store and the load, these barriers ensure the reader sees the fully initialised 'hyp_vcpu' object even if there were a lockless path or if the lock's own ordering guarantees were insufficient for nested object initialization. Fixes: 49af6ddb8e5c ("KVM: arm64: Add infrastructure to create and track pKVM instances at EL2") Reported-by: Ben Simner <ben.simner@cl.cam.ac.uk> Co-developed-by: Will Deacon <willdeacon@google.com> Signed-off-by: Will Deacon <willdeacon@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Fuad Tabba	08d7153382	KVM: arm64: Fix FEAT_SPE_FnE to use PMSIDR_EL1.FnE, not PMSVer FEAT_SPE_FnE is architecturally detected via PMSIDR_EL1.FnE [6], not ID_AA64DFR0_EL1.PMSVer. The FEAT_X macro form (register, field, value) cannot encode a PMSIDR_EL1-based feature, so FEAT_SPE_FnE was defined identically to FEAT_SPEv1p2 (ID_AA64DFR0_EL1, PMSVer, V1P2), producing a duplicate that used PMSVer >= V1P2 as a proxy. Replace the macro with feat_spe_fne(), following the same pattern as the sibling feat_spe_fds(): guard on FEAT_SPEv1p2 and read PMSIDR_EL1.FnE [6] directly. Wire the two NEEDS_FEAT consumers to use the new function. Remove the now-unused FEAT_SPE_FnE macro. Fixes: `63d423a763` ("KVM: arm64: Switch to table-driven FGU configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Fuad Tabba	2a62340811	KVM: arm64: Fix typo in feature check comments Revists -> Revisit. The following patch will add another similar line. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-24 12:03:57 +01:00
Fuad Tabba	7fe2cd4e1a	KVM: arm64: Fix FEAT_Debugv8p9 to check DebugVer, not PMUVer FEAT_Debugv8p9 is incorrectly defined against ID_AA64DFR0_EL1.PMUVer instead of ID_AA64DFR0_EL1.DebugVer. All three consumers of the macro gate features that are architecturally tied to FEAT_Debugv8p9 (DebugVer = 0b1011, DDI0487 M.b A2.2.10): - HDFGRTR2_EL2.nMDSELR_EL1, HDFGWTR2_EL2.nMDSELR_EL1: MDSELR_EL1 is present only when FEAT_Debugv8p9 is implemented (D24.3.21). - MDCR_EL2.EBWE: the Extended Breakpoint and Watchpoint Enable bit is RES0 unless FEAT_Debugv8p9 is implemented (D24.3.17). Neither register has any dependency on PMUVer. FEAT_Debugv8p9 and FEAT_PMUv3p9 are independent. Per DDI0487 M.b A2.2.10, FEAT_Debugv8p9 is unconditionally mandatory from Armv8.9, whereas FEAT_PMUv3p9 is mandatory only when FEAT_PMUv3 is implemented. An Armv8.9 CPU without a PMU has DebugVer = 0b1011 but PMUVer = 0b0000, so the wrong field check would cause KVM to incorrectly treat EBWE and MDSELR_EL1 as RES0 on such hardware. Fixes: `4bc0fe0898` ("KVM: arm64: Add sanitisation for FEAT_FGT2 registers") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Sebastian Ene	480ea48cad	KVM: arm64: Reject non compliant SMCCC function calls in pKVM Prevent the propagation of a function-id that has the top bits set since this is not compliant with the SMCCC spec and can overlap with the already known function-id decoders. (eg. if we invoke an smc with 0xffffffffc4000012 it will be decoded as a PSCI reset call). Instead, make it clear that we don't support it and return an error. Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://patch.msgid.link/20260408114118.422604-1-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-24 12:03:57 +01:00
David Woodhouse	a0e6ae45af	KVM: arm64: vgic: Fix IIDR revision field extracted from wrong value The uaccess write handlers for GICD_IIDR in both GICv2 and GICv3 extract the revision field from 'reg' (the current IIDR value read back from the emulated distributor) instead of 'val' (the value userspace is trying to write). This means userspace can never actually change the implementation revision — the extracted value is always the current one. Fix the FIELD_GET to use 'val' so that userspace can select a different revision for migration compatibility. Fixes: `49a1a2c70a` ("KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://patch.msgid.link/20260407210949.2076251-2-dwmw2@infradead.org Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:47 +01:00
Marc Zyngier	f05799491d	KVM: arm64: pkvm: Adopt MARKER() to define host hypercall ranges The EL2 code defines ranges of host hypercalls that are either enabled at boot-time only, used by [nh]VHE KVM, or reserved to pKVM. The way these ranges are delineated is error prone, as the enum symbols defining the limits are expressed in terms of actual function symbols. This means that should a new function be added, special care must be taken to also update the limit symbol. Improve this by reusing the mechanism introduced for the vcpu_sysreg enum, which uses a MARKER() macro and some extra trickery to make the limit symbol standalone. Crucially, the limit symbol has the same value as the following symbol. The handle_host_hcall() function is then updated to make use of the new limit definitions and get rid of the brittle default upper limit. This allows for some more strict checks at build time, and the removal of an comparison at run time. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260414160528.2218858-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-18 09:07:13 +01:00
Vincent Donnefort	ccab51d69b	KVM: arm64: Re-allow hyp tracing HVCs for [nh]VHE The introduction of __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM excluded hyp tracing HVCs from the common [nh]VHE/pKVM list. Re-allow them. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260414100231.1859687-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-17 15:21:05 +01:00
Linus Torvalds	01f492e181	Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported. This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor. - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn. LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel(). - Add DMSINTC irqchip in kernel support. RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors. - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code. - Fix LPSW/E to update the bear (which of course is the breaking event address register). x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized. - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes. - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage. x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace. - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier"). - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O. - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times. - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM. - Exempt SMM from CPUID faulting on Intel, as per the spec. - Misc hardening and cleanup changes. x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs. - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs. - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input. - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION. - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault. - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl. - Convert a pile of kvm->lock SEV code to guard(). - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6. - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2. - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE. - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore. - Add a variety of missing nSVM consistency checks. - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT. - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions. - Add support for save+restore of virtualized LBRs (on SVM). - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain. - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features. - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests. - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses). - Cache all used vmcb12 fields to further harden against TOCTOU bugs. x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros. - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate. - Code cleanups. guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to. LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests. x86 selftests: - Add support for Hygon CPUs in KVM selftests. - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP. - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnftRQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAzwf+NKO4Ktv+7A22ImN0SBl0nlUuulsz vTcw3+hxdRoIw83GdNS+hG5js0wrpMDnbv3t4+VliDNBSSxrBzcSWX2wpilW0Xtw qGo1MWhs2lKPy1NlaRVOwPS6j7uF3AR0TQ1iQLGMedQuCU9WpiKJxyhNXJdbLrt3 8EgFzsvtEsv+jKNRUNDf9+d0j4gZsFyIe+Brhianbw+u3/UCiUClLCdsKPc4+5ZX 08otYXytacGNIf/5Ev1vT4pHkHL0yqKXAtX7LEtaS3+0KrPuLjV4slemivzE9vf5 Evafm5AhA4wpaNMb1ZerhY3T94lsMaJpWxotjR//0Q7C9B59pCQnXCm8mg== =CcE0 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Linus Torvalds	c43267e679	arm64 updates for 7.1: Core features: - Add support for FEAT_LSUI, allowing futex atomic operations without toggling Privileged Access Never (PAN) - Further refactor the arm64 exception handling code towards the generic entry infrastructure - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis through it Memory management: - Refactor the arm64 TLB invalidation API and implementation for better control over barrier placement and level-hinted invalidation - Enable batched TLB flushes during memory hot-unplug - Fix rodata=full block mapping support for realm guests (when BBML2_NOABORT is available) Perf and PMU: - Add support for a whole bunch of system PMUs featured in NVIDIA's Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers for CPU/C2C memory latency PMUs) - Clean up iomem resource handling in the Arm CMN driver - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon} MPAM (Memory Partitioning And Monitoring): - Add architecture context-switch and hiding of the feature from KVM - Add interface to allow MPAM to be exposed to user-space using resctrl - Add errata workaround for some existing platforms - Add documentation for using MPAM and what shape of platforms can use resctrl Miscellaneous: - Check DAIF (and PMR, where relevant) at task-switch time - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode (only relevant to asynchronous or asymmetric tag check modes) - Remove a duplicate allocation in the kexec code - Remove redundant save/restore of SCS SP on entry to/from EL0 - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap descriptions - Add kselftest coverage for cmpbr_sigill() - Update sysreg definitions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnc8DEACgkQa9axLQDI XvFauRAAhc1cIgoRpgtdZd7+3/g457teDPYA3L/CjJzI28aesIpV/ECrEw2GL4xs HrQfijF4oyCDbBwh0sAascO/H7RoyOranlbuc+fVJ6Bj6gP9STzR4GmscsWkAMSJ vA3Jd1DREdDBO2sjw+hGhht84nRlcfY1FyORJP+1JaFH4oWTWsRNeOZIiI3BhxR8 EtFP9E8r2Esxi/FmZb/47m7kYCEH+XsrzQvBQNLVCH899QX2Hn0kAY70ndq2ZiQl n+zLAe7FBFwKzUVmlgWuhjrWMmK+1TthK/XQuOtxg13dHmX+vE/j+A+dOqRWSfHY ktNcWaf6m4+TWKVeVTe4E1cnSuwTQTm4VQKd9zaeQxiZYyYJhCQjXuEZg3vDmDbq F6D3MpTaJHRRWp0rEurxnSBlmQPCBE2IxEBdSrjd/WJ6T9e1oYwWiSJSS7bGCgGr dd/XLsOY7Um5n4ooIFEZc1de6VO6/VTKjmxnBMgU+Sa1REbLpD438IX/6CjzG5qM l5Ulke/c6/a/faeVCEpZpD8JuvNOzo9RISDPrNg1KKAL+OSU+9tgmVjIFPhDDB0w zNTqT7YJIhxlJxnUGWDk8YNsTjT3OzyquY9UT1tBTBqC0k13J2i2ev30toUez7xj 2aV+9qMpunbLtwYhXNun1hBFiYrCxpX7I8ha0hXiXL0CywVOPTI= =CnVn -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The biggest changes are MPAM enablement in drivers/resctrl and new PMU support under drivers/perf. On the core side, FEAT_LSUI lets futex atomic operations with EL0 permissions, avoiding PAN toggling. The rest is mostly TLB invalidation refactoring, further generic entry work, sysreg updates and a few fixes. Core features: - Add support for FEAT_LSUI, allowing futex atomic operations without toggling Privileged Access Never (PAN) - Further refactor the arm64 exception handling code towards the generic entry infrastructure - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis through it Memory management: - Refactor the arm64 TLB invalidation API and implementation for better control over barrier placement and level-hinted invalidation - Enable batched TLB flushes during memory hot-unplug - Fix rodata=full block mapping support for realm guests (when BBML2_NOABORT is available) Perf and PMU: - Add support for a whole bunch of system PMUs featured in NVIDIA's Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers for CPU/C2C memory latency PMUs) - Clean up iomem resource handling in the Arm CMN driver - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon} MPAM (Memory Partitioning And Monitoring): - Add architecture context-switch and hiding of the feature from KVM - Add interface to allow MPAM to be exposed to user-space using resctrl - Add errata workaround for some existing platforms - Add documentation for using MPAM and what shape of platforms can use resctrl Miscellaneous: - Check DAIF (and PMR, where relevant) at task-switch time - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode (only relevant to asynchronous or asymmetric tag check modes) - Remove a duplicate allocation in the kexec code - Remove redundant save/restore of SCS SP on entry to/from EL0 - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap descriptions - Add kselftest coverage for cmpbr_sigill() - Update sysreg definitions" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (109 commits) arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd ACPI: AGDI: fix missing newline in error message arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() ...	2026-04-14 16:48:56 -07:00
Paolo Bonzini	e74c3a8891	KVM/arm64 updates for 7.1 * New features: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This comes with a full infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, with anonymous memory being used as a backing store. About time! * Improvements and bug fixes: - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmnWdswACgkQI9DQutE9 ekNYvBAAxj5Zmsx8sJ2CYDTJc2w4XkEjSgDugA+J/s0TMgrzExeBlWCstdhVTncy 68nwOjQl3TotnIrt7q36kko9u7IdD0pHNrk34NtlggLjHfB61n9SNcAA6j4F6zJa GFkHpJSrSnZuUPqapkDnlyhuPkgTIAkEUk2Am9siksSfY4HvRyHZJm2FTdxsdIBn NN9wvQqw2wefTXOQ8gS+oHbPVp1cPbwrF2a3EhzXXv/6W3mUBstXgsijgo07UzCp W6vHCv2wqHbHdf67z3Q3hL+VXlVH6oHlyW99/swqISvqRkH/iSB90+oUojnMRrSm yB6Wmhh8jboCaajWMJhG+veZw+7GMXU4nOrGd1rbnY8cwRl/TQ5YibhRm7DIdvjO xeUluTLJ0NdweQUwE2k4OlgKOuGang3E2p0clmkUO4SstA48MdqR/kpST6guIlWw U5syuNaaaiuwP5QOi9qZmMCNmQ3ZfnZG3nseJFdoyGjhVhf5jyQyv4Du9vGZQFF/ Zkg7yTqC4OWiC+3GkW9YYAySM1MyetivLtd47PGzHPTdtaZziWhNvQ0y+8QjQ+R+ CJNvyS/DvsT7epSya4sLgMP1ZAlih9xkz5sQ6k8NJLBYYXi0v33qwqditErgLLyj S4Ci4WNhHHWIusvCVM7JUBkH0AElpmi506f7F6iHoFLlkYR4t9U= =/SuQ -----END PGP SIGNATURE----- Merge tag 'kvmarm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 7.1 * New features: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This comes with a full infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, with anonymous memory being used as a backing store. About time! * Improvements and bug fixes: - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn.	2026-04-13 11:49:54 +02:00
Catalin Marinas	480a9e57cc	Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message	2026-04-10 14:22:24 +01:00
Marc Zyngier	94b4ae79eb	Merge branch kvm-arm64/misc-7.1 into kvmarm-master/next * kvm-arm64/misc-7.1: KVM: arm64: selftests: Avoid testing the IMPDEF behavior KVM: arm64: Destroy stage-2 page-table in kvm_arch_destroy_vm() KVM: arm64: Don't leave mmu->pgt dangling on kvm_init_stage2_mmu() error KVM: arm64: Prevent the host from using an smc with imm16 != 0 Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:26:11 +01:00
Marc Zyngier	d77f4792db	Merge branch kvm-arm64/vgic-fixes-7.1 into kvmarm-master/next * kvm-arm64/vgic-fixes-7.1: : . : FIrst pass at fixing a number of vgic-v5 bugs that were found : after the merge of the initial series. : . KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid() KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer() KVM: arm64: Kill arch_timer_context::direct field KVM: arm64: vgic-v5: Correctly set dist->ready once initialised KVM: arm64: vgic-v5: Make the effective priority mask a strict limit KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2 KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs KVM: arm64: Account for RESx bits in __compute_fgt() KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1 arm64: Fix field references for ICH_PPI_DVIR[01]_EL2 KVM: arm64: Don't skip per-vcpu NV initialisation KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:26:00 +01:00
Marc Zyngier	83a3980750	Merge branch kvm-arm64/pkvm-protected-guest into kvmarm-master/next * kvm-arm64/pkvm-protected-guest: (41 commits) : . : pKVM support for protected guests, implementing the very long : awaited support for anonymous memory, as the elusive guestmem : has failed to deliver on its promises despite a multi-year : effort. Patches courtesy of Will Deacon. From the initial cover : letter: : : "[...] this patch series implements support for protected guest : memory with pKVM, where pages are unmapped from the host as they are : faulted into the guest and can be shared back from the guest using pKVM : hypercalls. Protected guests are created using a new machine type : identifier and can be booted to a shell using the kvmtool patches : available at [2], which finally means that we are able to test the pVM : logic in pKVM. Since this is an incremental step towards full isolation : from the host (for example, the CPU register state and DMA accesses are : not yet isolated), creating a pVM requires a developer Kconfig option to : be enabled in addition to booting with 'kvm-arm.mode=protected' and : results in a kernel taint." : . KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm' drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL KVM: arm64: Rename PKVM_PAGE_STATE_MASK KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim KVM: arm64: Register 'selftest_vm' in the VM table KVM: arm64: Extend pKVM page ownership selftests to cover guest donation KVM: arm64: Add some initial documentation for pKVM KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler KVM: arm64: Introduce hypercall to force reclaim of a protected page KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 KVM: arm64: Change 'pkvm_handle_t' to u16 KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:25:39 +01:00
Marc Zyngier	73bb0bc2f4	Merge branch kvm-arm64/spe-trbe-nvhe into kvmarm-master/next * kvm-arm64/spe-trbe-nvhe: : . : Fix SPE and TRBE nVHE world switch which can otherwise result in : pretty bad behaviours, as they have the nasty habit of performing : out of context speculative page table walks. : : Patches courtesy of Will Deacon. : . KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines KVM: arm64: Disable SPE Profiling Buffer when running in guest context KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:23:58 +01:00
Marc Zyngier	64f2fa630d	Merge branch kvm-arm64/user_mem_abort-rework into kvmarm-master/next * kvm-arm64/user_mem_abort-rework: (30 commits) : . : user_mem_abort() has become an absolute pain to maintain, : to the point that each single fix is likely to introduce : two new bugs. : : Deconstruct the whole thing in logical units, reducing : the amount of visible and/or mutable state between functions, : and finally making the code a bit more maintainable. : . KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc KVM: arm64: Simplify integration of adjust_nested_*_perms() KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn() KVM: arm64: Replace force_pte with a max_map_size attribute KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info KVM: arm64: Restrict the scope of the 'writable' attribute KVM: arm64: Kill logging_active from kvm_s2_fault KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info KVM: arm64: Kill topup_memcache from kvm_s2_fault KVM: arm64: Kill exec_fault from kvm_s2_fault KVM: arm64: Kill write_fault from kvm_s2_fault KVM: arm64: Constrain fault_granule to kvm_s2_fault_map() KVM: arm64: Replace fault_is_perm with a helper KVM: arm64: Move fault context to const structure KVM: arm64: Make fault_ipa immutable KVM: arm64: Kill fault->ipa KVM: arm64: Clean up control flow in kvm_s2_fault_map() KVM: arm64: Hoist MTE validation check out of MMU lock path KVM: arm64: Optimize early exit checks in kvm_s2_fault_pin_pfn() ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:23:45 +01:00
Marc Zyngier	b693940e81	Merge branch kvm-arm64/pkvm-psci into kvmarm-master/next * kvm-arm64/pkvm-psci: : . : Cleanup of the pKVM PSCI relay CPU entry code, making it slightly : easier to follow, should someone have to wade into these waters : ever again. : . KVM: arm64: Remove extra ISBs when using msr_hcr_el2 KVM: arm64: pkvm: Use direct function pointers for cpu_{on,resume} KVM: arm64: pkvm: Turn __kvm_hyp_init_cpu into an inner label KVM: arm64: pkvm: Simplify BTI handling on CPU boot KVM: arm64: pkvm: Move error handling to the end of kvm_hyp_cpu_entry Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:23:24 +01:00
Marc Zyngier	e85d1c0cc7	Merge branch kvm-arm64/nv-s2-debugfs into kvmarm-master/next * kvm-arm64/nv-s2-debugfs: : . : Expand the stage-2 ptdump infrastructure to be able to display : the content of the shadow s2 tables generated by nested virt. : : Patches courtesy of Wei-Lin Chang. : . KVM: arm64: ptdump: Initialize parser_state before pgtable walk KVM: arm64: nv: Expose shadow page tables in debugfs KVM: arm64: ptdump: Make KVM ptdump code s2 mmu aware Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:22:55 +01:00
Marc Zyngier	f8078d51ee	Merge branch kvm-arm64/vgic-v5-ppi into kvmarm-master/next * kvm-arm64/vgic-v5-ppi: (40 commits) : . : Add initial GICv5 support for KVM guests, only adding PPI support : for the time being. Patches courtesy of Sascha Bischoff. : : From the cover letter: : : "This is v7 of the patch series to add the virtual GICv5 [1] device : (vgic_v5). Only PPIs are supported by this initial series, and the : vgic_v5 implementation is restricted to the CPU interface, : only. Further patch series are to follow in due course, and will add : support for SPIs, LPIs, the GICv5 IRS, and the GICv5 ITS." : . KVM: arm64: selftests: Add no-vgic-v5 selftest KVM: arm64: selftests: Introduce a minimal GICv5 PPI selftest KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI Documentation: KVM: Introduce documentation for VGICv5 KVM: arm64: gic-v5: Probe for GICv5 device KVM: arm64: gic-v5: Set ICH_VCTLR_EL2.En on boot KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register them KVM: arm64: gic-v5: Hide FEAT_GCIE from NV GICv5 guests KVM: arm64: gic: Hide GICv5 for protected guests KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5 KVM: arm64: gic-v5: Enlighten arch timer for GICv5 irqchip/gic-v5: Introduce minimal irq_set_type() for PPIs KVM: arm64: gic-v5: Initialise ID and priority bits when resetting vcpu KVM: arm64: gic-v5: Create and initialise vgic_v5 KVM: arm64: gic-v5: Support GICv5 interrupts with KVM_IRQ_LINE KVM: arm64: gic-v5: Implement direct injection of PPIs KVM: arm64: Introduce set_direct_injection irq_op KVM: arm64: gic-v5: Trap and mask guest ICC_PPI_ENABLERx_EL1 writes KVM: arm64: gic-v5: Check for pending PPIs KVM: arm64: gic-v5: Clear TWI if single task running ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:22:35 +01:00
Marc Zyngier	2de32a25a3	Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next * kvm-arm64/hyp-tracing: (40 commits) : . : EL2 tracing support, adding both 'remote' ring-buffer : infrastructure and the tracing itself, courtesy of : Vincent Donnefort. From the cover letter: : : "The growing set of features supported by the hypervisor in protected : mode necessitates debugging and profiling tools. Tracefs is the : ideal candidate for this task: : : * It is simple to use and to script. : : * It is supported by various tools, from the trace-cmd CLI to the : Android web-based perfetto. : : * The ring-buffer, where are stored trace events consists of linked : pages, making it an ideal structure for sharing between kernel and : hypervisor. : : This series first introduces a new generic way of creating remote events and : remote buffers. Then it adds support to the pKVM hypervisor." : . tracing: selftests: Extend hotplug testing for trace remotes tracing: Non-consuming read for trace remotes with an offline CPU tracing: Adjust cmd_check_undefined to show unexpected undefined symbols tracing: Restore accidentally removed SPDX tag KVM: arm64: avoid unused-variable warning tracing: Generate undef symbols allowlist for simple_ring_buffer KVM: arm64: tracing: add ftrace dependency tracing: add more symbols to whitelist tracing: Update undefined symbols allow list for simple_ring_buffer KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing tracing: selftests: Add hypervisor trace remote tests KVM: arm64: Add selftest event support to nVHE/pKVM hyp KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote KVM: arm64: Add trace reset to the nVHE/pKVM hyp KVM: arm64: Sync boot clock with the nVHE/pKVM hyp KVM: arm64: Add trace remote for the nVHE/pKVM hyp KVM: arm64: Add tracing capability for the nVHE/pKVM hyp KVM: arm64: Support unaligned fixmap in the pKVM hyp KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:21:51 +01:00
Sascha Bischoff	9c1ac77ddf	KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs GICv5 supports up to 128 PPIs, which would introduce a large amount of overhead if all of them were actively tracked. Rather than keeping track of all 128 potential PPIs, we instead only consider the set of architected PPIs (the first 64). Moreover, we further reduce that set by only exposing a subset of the PPIs to a guest. In practice, this means that only 4 PPIs are typically exposed to a guest - the SW_PPI, PMUIRQ, and the timers. When folding the PPI state, changed bits in the active or pending were used to choose which state to sync back. However, this breaks badly for Edge interrupts when exiting the guest before it has consumed the edge. There is no change in pending state detected, and the edge is lost forever. Given the reduced set of PPIs exposed to the guest, and the issues around tracking the edges, drop the tracking of changed state, and instead iterate over the limited subset of PPIs exposed to the guest directly. This change drops the second copy of the PPI pending state used for detecting edges in the pending state, and reworks vgic_v5_fold_ppi_state() to iterate over the VM's PPI mask instead. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260401162152.932243-1-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 17:52:17 +01:00
Will Deacon	a3ca3bfd01	KVM: arm64: Destroy stage-2 page-table in kvm_arch_destroy_vm() kvm_arch_destroy_vm() can be called on the kvm_create_vm() error path after we have failed to register the MMU notifiers for the new VM. In this case, we cannot rely on the MMU ->release() notifier to call kvm_arch_flush_shadow_all() and so the stage-2 page-table allocated in kvm_arch_init_vm() will be leaked. Explicitly destroy the stage-2 page-table in kvm_arch_destroy_vm(), so that we clean up after kvm_arch_destroy_vm() without relying on the MMU notifiers. Link: https://sashiko.dev/#/patchset/20260327140039.21228-1-will%40kernel.org?patch=12265 Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260327192758.21739-3-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 16:48:16 +01:00
Will Deacon	2fc0f3e2b9	KVM: arm64: Don't leave mmu->pgt dangling on kvm_init_stage2_mmu() error If kvm_init_stage2_mmu() fails to allocate 'mmu->last_vcpu_ran', it destroys the newly allocated stage-2 page-table before returning ENOMEM. Unfortunately, it also leaves a dangling pointer in 'mmu->pgt' which points at the freed 'kvm_pgtable' structure. This is likely to confuse the kvm_vcpu_init_nested() failure path which can double-free the structure if it finds it via kvm_free_stage2_pgd(). Ensure that the dangling 'mmu->pgt' pointer is cleared when returning an error from kvm_init_stage2_mmu(). Link: https://sashiko.dev/#/patchset/20260327140039.21228-1-will%40kernel.org?patch=12265 Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260327192758.21739-2-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 16:48:16 +01:00
Sebastian Ene	cf6348af64	KVM: arm64: Prevent the host from using an smc with imm16 != 0 The ARM Service Calling Convention (SMCCC) specifies that the function identifier and parameters should be passed in registers, leaving the 16-bit immediate field un-handled in pKVM when an SMC instruction is trapped. Since the HVC is a private interface between EL2 and the host, enforce the host kernel running under pKVM to use an immediate value of 0 only when using SMCs to make it clear for non-compliant software talking to Trustzone that we only use SMCCC. Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260330105441.3226904-1-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 16:39:10 +01:00
Marc Zyngier	f4626281c6	KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported Although the AArch32 ID regs are architecturally UNKNOWN when AArch32 isn't supported at any EL, KVM makes a point in making them RAZ. Therefore, advertising GICv3 in ID_PFR1_EL1 must be gated on AArch32 being supported at least at EL0. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `a258a383b9` ("KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20260401103611.357092-16-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	be46a408f3	KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling While we now compute ID_AA64PFR2_EL1 to a glorious 0, we never use that data and instead return the 0 that corresponds to an allocated idreg. Not a big deal, but we might as well be consistent. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `5aefaf11f9` ("KVM: arm64: gic: Hide GICv5 for protected guests") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-15-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	06c85b58e0	KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid() Userspace can set the timer PPI numbers way before a GIC has been created, leading to odd behaviours on GICv5 as we'd accept non architectural PPI numbers. Move the v5 check into timer_irqs_are_valid(), which aligns the behaviour with the pre-v5 GICs, and is also guaranteed to run only once a GIC has been configured. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `9491c63b6c` ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	fbcbf259d9	KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer() The vgic-v5 code added some evaluations of the timers in a helper funtion (kvm_cpu_has_pending_timer()) that is called to determine whether the vcpu can wake-up. But looking at the timer there is wrong: - we want to see timers that are signalling an interrupt to the vcpu, and not just that have a pending interrupt - we already have kvm_arch_vcpu_runnable() that evaluates the state of interrupts - kvm_cpu_has_pending_timer() really is about WFIT, as the timeout does not generate an interrupt, and is therefore distinct from the point above As a consequence, revert these changes and teach vgic_v5_has_pending_ppi() about checking for pending HW interrupts instead. Fixes: `9491c63b6c` ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-13-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	8fe30434a8	KVM: arm64: Kill arch_timer_context::direct field The newly introduced arch_timer_context::direct field is a bit pointless, as it is always set on timers that are... err... direct, while we already have a way to get to that by doing a get_map() operation. Additionally, this field is: - only set when get_map() is called - never cleared and the single point where it is actually checked doesn't call get_map() at all. At this stage, it is probably better to just kill it, and rely on get_map() to give us the correct information. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `9491c63b6c` ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	848fa8373a	KVM: arm64: vgic-v5: Correctly set dist->ready once initialised kvm_vgic_map_resources() targetting a v5 model results in vgic->dist_ready never being set. This doesn't result in anything really bad, only some more heavy locking as we go and re-init something for no good reason. Rejig the code to correctly set the ready flag in all non-failing cases. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `f4d37c7c35` ("KVM: arm64: gic-v5: Create and initialise vgic_v5") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	a4a6455847	KVM: arm64: vgic-v5: Make the effective priority mask a strict limit The way the effective priority mask is compared to the priority of an interrupt to decide whether to wake-up or not, is slightly odd, and breaks at the limits. This could result in spurious wake-ups that are undesirable. Make the computed priority mask comparison a strict inequality, so that interrupts that have the same priority as the mask are not signalled. Fixes: `933e5288fa` ("KVM: arm64: gic-v5: Check for pending PPIs") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	42d7eac829	KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours Passing a u64 to __builtin_ctz() is odd, and requires some digging to figure out why this construct is indeed safe as long as the HW is correct. But it is much easier to make it clear to the compiler by casting the u64 into an intermediate u32, and be done with the UD. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `933e5288fa` ("KVM: arm64: gic-v5: Check for pending PPIs") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	170a77b418	KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2 While it is perfectly correct to leave the pending state of a level interrupt as is when queuing it (it is, after all, only driven by the line), edge pending state must be transfered, as nothing will lower it. Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `4d591252ba` ("KVM: arm64: gic-v5: Implement PPI interrupt injection") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	e63d0a32e7	KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs Finalizing the PPI state is done without holding any lock, which means that two vcpus can race against each other and have one zeroing the state while another one is setting it, or even maybe using it. Fixing this is done by: - holding the config lock while performing the initialisation - checking if SW_PPI has already been advertised, meaning that we have already completed the initialisation once Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `8f1fbe2fd2` ("KVM: arm64: gic-v5: Finalize GICv5 PPIs and generate mask") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Link: https://patch.msgid.link/20260401103611.357092-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00
Marc Zyngier	d70d4323dd	KVM: arm64: Account for RESx bits in __compute_fgt() When computing Fine Grained Traps, it is preferable to account for the reserved bits. The HW will most probably ignore them, unless the bits have been repurposed to do something else. Use caution, and fold our view of the reserved bits in, Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `c259d763e6` ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260401103611.357092-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 15:42:26 +01:00

1 2 3 4 5 ...

3675 Commits