linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Fuad Tabba	73b9c1e5da	KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu() Two bugs exist in the vCPU initialisation path: 1. If a check fails after hyp_pin_shared_mem() succeeds, the cleanup path jumps to 'unlock' without calling unpin_host_vcpu() or unpin_host_sve_state(), permanently leaking pin references on the host vCPU and SVE state pages. Extract a register_hyp_vcpu() helper that performs the checks and the store. When register_hyp_vcpu() returns an error, call unpin_host_vcpu() and unpin_host_sve_state() inline before falling through to the existing 'unlock' label. 2. register_hyp_vcpu() publishes the new vCPU pointer into 'hyp_vm->vcpus[]' with a bare store, allowing a concurrent caller of pkvm_load_hyp_vcpu() to observe a partially initialised vCPU object. Ensure the store uses smp_store_release() and the load uses smp_load_acquire(). While 'vm_table_lock' currently serialises the store and the load, these barriers ensure the reader sees the fully initialised 'hyp_vcpu' object even if there were a lockless path or if the lock's own ordering guarantees were insufficient for nested object initialization. Fixes: 49af6ddb8e5c ("KVM: arm64: Add infrastructure to create and track pKVM instances at EL2") Reported-by: Ben Simner <ben.simner@cl.cam.ac.uk> Co-developed-by: Will Deacon <willdeacon@google.com> Signed-off-by: Will Deacon <willdeacon@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260424084908.370776-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org	2026-04-24 12:03:57 +01:00
Will Deacon	bc20692f52	KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim Now that the teardown of a VM cannot be finalised as long as a reference is held on the VM, rework __pkvm_reclaim_dying_guest_page() to hold a reference to the dying VM rather than take the global 'vm_table_lock' during the reclaim operation. Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260331155056.28220-4-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 14:29:06 +01:00
Will Deacon	2400696883	KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM Now that completion of the teardown path requires a refcount of zero for the target VM, we can allow get_pkvm_hyp_vm() to take a reference on a dying VM, which is necessary to unshare pages with a non-protected VM during the teardown process itself. Note that vCPUs belonging to a dying VM cannot be loaded and pages can only be reclaimed from a protected VM (via __pkvm_reclaim_dying_guest_page()) if the target VM is in the dying state. Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260331155056.28220-3-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 14:29:06 +01:00
Will Deacon	760299a1d8	KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm' Destroying a 'hyp_vm' with an elevated referenced count in __pkvm_finalize_teardown_vm() is only going to lead to tears. In preparation for allowing limited references to be acquired on dying VMs during the teardown process, factor out the handle-to-vm logic for the teardown path and reuse it for both the 'start' and 'finalise' stages of the teardown process. Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260331155056.28220-2-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-01 14:29:06 +01:00
Will Deacon	8972a99160	KVM: arm64: Register 'selftest_vm' in the VM table In preparation for extending the pKVM page ownership selftests to cover forceful reclaim of donated pages, rework the creation of the 'selftest_vm' so that it is registered in the VM table while the tests are running. Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-35-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Will Deacon	246c976c37	KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow protected VMs to unshare memory that was previously shared with the host using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-31-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Will Deacon	03313efed5	KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected VMs to share memory (e.g. the swiotlb bounce buffers) back to the host. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-30-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Will Deacon	94c5250515	KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Add a hypercall handler at EL2 for hypercalls originating from protected VMs. For now, this implements only the FEATURES and MEMINFO calls, but subsequent patches will implement the SHARE and UNSHARE functions necessary for virtio. Unhandled hypercalls (including PSCI) are passed back to the host. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-29-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Will Deacon	56080f53a6	KVM: arm64: Introduce hypercall to force reclaim of a protected page Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow the host to forcefully reclaim a physical page that was previous donated to a protected guest. This results in the page being zeroed and the previous guest mapping being poisoned so that new pages cannot be subsequently donated at the same IPA. Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-26-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Will Deacon	44887977ab	KVM: arm64: Change 'pkvm_handle_t' to u16 'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches will rely on it being no more than 16 bits so that it can be encoded into a pte annotation. Change 'pkvm_handle_t' to a u16 and add a compile-type check that the maximum handle fits into the reduced type. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-24-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:08 +01:00
Will Deacon	0bf5f4d400	KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() To enable reclaim of pages from a protected VM during teardown, introduce a new hypercall to reclaim a single page from a protected guest that is in the dying state. Since the EL2 code is non-preemptible, the new hypercall deliberately acts on a single page at a time so as to allow EL1 to reschedule frequently during the teardown operation. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Co-developed-by: Quentin Perret <qperret@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-16-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:08 +01:00
Will Deacon	6c58f914eb	KVM: arm64: Split teardown hypercall into two phases In preparation for reclaiming protected guest VM pages from the host during teardown, split the current 'pkvm_teardown_vm' hypercall into separate 'start' and 'finalise' calls. The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying' state, which is a point of no return past which no vCPU of the pVM is allowed to run any more. Once in this new state, 'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and page-table pages from the VM. A subsequent patch will add support for reclaiming the individual guest memory pages. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Co-developed-by: Quentin Perret <qperret@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-12-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:07 +01:00
Fuad Tabba	02471a78a0	KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a hypervisor virtual address during vCPU initialization in `pkvm_vcpu_init_sve()`. `unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since `kern_hyp_va()` is idempotent, it's not a bug. However, it is unnecessary and potentially confusing. Remove the redundant conversion. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-13 14:54:48 +00:00
Fuad Tabba	7e7c2cf002	KVM: arm64: Fix ID register initialization for non-protected pKVM guests In protected mode, the hypervisor maintains a separate instance of the `kvm` structure for each VM. For non-protected VMs, this structure is initialized from the host's `kvm` state. Currently, `pkvm_init_features_from_host()` copies the `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the underlying `id_regs` data being initialized. This results in the hypervisor seeing the flag as set while the ID registers remain zeroed. Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for non-protected VMs. This breaks logic that relies on feature detection, such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not saved/restored during the world switch, which could lead to state corruption. Fix this by explicitly copying the ID registers from the host `kvm` to the hypervisor `kvm` for non-protected VMs during initialization, since we trust the host with its non-protected guests' features. Also ensure `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in `pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly initialize them and set the flag once done. Fixes: `41d6028e28` ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-13 14:54:48 +00:00
Marc Zyngier	183ac2b2ad	Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/next * kvm-arm64/pkvm-no-mte: : . : pKVM updates preventing the host from using MTE-related system : sysrem registers when the feature is disabled from the kernel : command-line (arm64.nomte), courtesy of Fuad Taba. : : From the cover letter: : : "If MTE is supported by the hardware (and is enabled at EL3), it remains : available to lower exception levels by default. Disabling it in the host : kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising : the feature; it does not physically disable MTE in the hardware. : : The ability to disable MTE in the host kernel is used by some systems, : such as Android, so that the physical memory otherwise used as tag : storage can be used for other things (i.e. treated just like the rest of : memory). In this scenario, a malicious host could still access tags in : pages donated to a guest using MTE instructions (e.g., STG and LDG), : bypassing the kernel's configuration." : . KVM: arm64: Use kvm_has_mte() in pKVM trap initialization KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled KVM: arm64: Trap MTE access and discovery when MTE is disabled KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:16:31 +00:00
Fuad Tabba	230b080623	KVM: arm64: Use kvm_has_mte() in pKVM trap initialization When initializing HCR traps in protected mode, use kvm_has_mte() to check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1, MTE, IMP). kvm_has_mte() provides a more comprehensive check: - kvm_has_feat() only checks if MTE is in the guest's ID register view (i.e., what we advertise to the guest) - kvm_has_mte() checks both system_supports_mte() AND whether KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260122112218.531948-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-23 11:28:48 +00:00
Marc Zyngier	c983b3e276	Merge branch kvm-arm64/pkvm-features-6.20 into kvmarm-master/next * kvm-arm64/pkvm-features-6.20: : . : pKVM guest feature trapping fixes, courtesy of Fuad Tabba. : . KVM: arm64: Prevent host from managing timer offsets for protected VMs KVM: arm64: Check whether a VM IOCTL is allowed in pKVM KVM: arm64: Track KVM IOCTLs and their associated KVM caps KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM KVM: arm64: Include VM type when checking VM capabilities in pKVM KVM: arm64: Introduce helper to calculate fault IPA offset KVM: arm64: Fix MTE flag initialization for protected VMs KVM: arm64: Fix Trace Buffer trap polarity for protected VMs KVM: arm64: Fix Trace Buffer trapping for protected VMs Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-23 10:04:47 +00:00
Fuad Tabba	43a21a0f0c	KVM: arm64: Include VM type when checking VM capabilities in pKVM Certain features and capabilities are restricted in protected mode. Most of these features are restricted only for protected VMs, but some are restricted for ALL VMs in protected mode. Extend the pKVM capability check to pass the VM (kvm), and use that when determining supported features. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 15:43:15 +00:00
Fuad Tabba	ebbcaece84	KVM: arm64: Fix MTE flag initialization for protected VMs The function pkvm_init_features_from_host() initializes guest features, propagating them from the host. The logic to propagate KVM_ARCH_FLAG_MTE_ENABLED (Memory Tagging Extension) has a couple of issues. First, the check was in the common path, before the divergence for protected and non-protected VMs. For non-protected VMs, this was unnecessary, as 'kvm->arch.flags' is completely overwritten by host_arch_flags immediately after, which already contains the MTE flag. For protected VMs, this was setting the flag even if the feature is not allowed. Second, the check was reading 'host_kvm->arch.flags' instead of using the local 'host_arch_flags', which is read once from the host flags. Fix these by moving the MTE flag check inside the protected-VM-only path, checking if the feature is allowed, and changing it to use the correct host_arch_flags local variable. This ensures non-protected VMs get the flag via the bulk copy, and protected VMs get it via an explicit check. Fixes: `b7f345fbc3` ("KVM: arm64: Fix FEAT_MTE in pKVM") Reviewed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 15:43:15 +00:00
Fuad Tabba	e913c7ce9e	KVM: arm64: Fix Trace Buffer trap polarity for protected VMs The E2TB bits in MDCR_EL2 control trapping of Trace Buffer system register accesses. These accesses are trapped to EL2 when the bits are clear. The trap initialization logic for protected VMs in pvm_init_traps_mdcr() had the polarity inverted. When a guest did not support the Trace Buffer feature, the code was setting E2TB. This incorrectly disabled the trap, potentially allowing a protected guest to access registers for a feature it was not given. Fix this by inverting the operation. Fixes: `f50758260b` ("KVM: arm64: Group setting traps for protected VMs by control register") Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 15:42:18 +00:00
Fuad Tabba	288eb55483	KVM: arm64: Fix Trace Buffer trapping for protected VMs For protected VMs in pKVM, the hypervisor should trap accesses to trace buffer system registers if Trace Buffer isn't supported by the VM. However, the current code only traps if Trace Buffer External Mode isn't supported. Fix this by checking for FEAT_TRBE (Trace Buffer) rather than FEAT_TRBE_EXT. Fixes: `9d52612690` ("KVM: arm64: Trap external trace for protected VMs") Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-15 15:42:18 +00:00
Alexandru Elisei	145cc42fe1	KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load Commit `fb10ddf35c` ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") introduced per-VCPU FGT traps. For an unprotected pKVM VCPU, the untrusted host FGT configuration is copied in pkvm_vcpu_init_traps(), which is called from __pkvm_init_vcpu(). __pkvm_init_vcpu() is called once per VCPU (when the VCPU is first run) which means that the uninitialized, zero, values for the FGT registers end up being used for the entire lifetime of the VCPU. This causes both unwanted traps (for the inverse polarity trap bits) and the guest being allowed to access registers it shouldn't. Fix it by copying the FGT traps for unprotected pKVM VCPUs when the untrusted host loads the VCPU. Fixes: `fb10ddf35c` ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") Acked-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251216103053.47224-2-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>	2026-01-08 12:56:17 -08:00
Marc Zyngier	567ebfedb5	KVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode As we are about to start trapping a bunch of extra things, augment the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC, making them legal instead of resulting in a UNDEF injection in the guest. While we're at it, ensure that pKVM captures the vgic model so that it can be checked by the emulation code. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:11 -08:00
Oliver Upton	fb10ddf35c	KVM: arm64: Compute per-vCPU FGTs at vcpu_load() To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Fuad Tabba	256b4668cd	KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization The existing __pkvm_init_vm hypercall performs both the reservation of a VM table entry and the initialization of the hypervisor VM state in a single operation. This design prevents the host from obtaining a VM handle from the hypervisor until all preparation for the creation and the initialization of the VM is done, which is on the first vCPU run operation. To support more flexible VM lifecycle management, the host needs the ability to reserve a handle early, before the first vCPU run. Refactor the hypercall interface to enable this, splitting the single hypercall into a two-stage process: - __pkvm_reserve_vm: A new hypercall that allocates a slot in the hypervisor's vm_table, marks it as reserved, and returns a unique handle to the host. - __pkvm_unreserve_vm: A corresponding cleanup hypercall to safely release the reservation if the host fails to proceed with full initialization. - __pkvm_init_vm: The existing hypercall is modified to no longer allocate a slot. It now expects a pre-reserved handle and commits the donated VM memory to that slot. For now, the host-side code in __pkvm_create_hyp_vm calls the new reserve and init hypercalls back-to-back to maintain existing behavior. This paves the way for subsequent patches to separate the reservation and initialization steps in the VM's lifecycle. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	814fd6beac	KVM: arm64: Consolidate pKVM hypervisor VM initialization logic The insert_vm_table_entry() function was performing tasks beyond its primary responsibility. In addition to inserting a VM pointer into the vm_table, it was also initializing several fields within 'struct pkvm_hyp_vm', such as the VMID and stage-2 MMU pointers. This mixing of concerns made the code harder to follow. As another preparatory step towards allowing a VM table entry to be reserved before the VM is fully created, this logic must be cleaned up. By separating table insertion from state initialization, we can control the timing of the initialization step more precisely in subsequent patches. Refactor the code to consolidate all initialization logic into init_pkvm_hyp_vm(): - Move the initialization of the handle, VMID, and MMU fields from insert_vm_table_entry() to init_pkvm_hyp_vm(). - Simplify insert_vm_table_entry() to perform only one action: placing the provided pkvm_hyp_vm pointer into the vm_table. - Update the calling sequence in __pkvm_init_vm() to first allocate an entry in the VM table, initialize the VM, and then insert the VM into the VM table. This is all protected by the vm_table_lock for now. Subsequent patches will adjust the sequence and not hold the vm_table_lock while initializing the VM at the hypervisor (init_pkvm_hyp_vm()). Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	1abc1ad529	KVM: arm64: Separate allocation and insertion of pKVM VM table entries The current insert_vm_table_entry() function performs two actions at once: it finds a free slot in the pKVM VM table and populates it with the pkvm_hyp_vm pointer. Refactor this function as a preparatory step for future work that will require reserving a VM slot and its corresponding handle earlier in the VM lifecycle, before the pkvm_hyp_vm structure is initialized and ready to be inserted. Split the function into a two-phase process: - A new allocate_vm_table_entry() function finds an empty slot, marks it as reserved with a RESERVED_ENTRY placeholder, and returns a handle derived from the slot's index. - The insert_vm_table_entry() function is repurposed to take the handle, validate that the corresponding slot is in the reserved state, and then populate it with the pkvm_hyp_vm pointer. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	3c45b67625	KVM: arm64: Decouple hyp VM creation state from its handle Currently, the presence of a pKVM handle (pkvm.handle != 0) is used to determine if the corresponding hypervisor (EL2) VM has been created and initialized. This couples the handle's lifecycle with the VM's creation state. This coupling will become problematic with upcoming changes that will allocate the pKVM handle earlier in the VM's life, before the VM is instantiated at the hypervisor. To prepare for this and make the state tracking explicit, decouple the two concepts. Introduce a new boolean flag, 'pkvm.is_created', to track whether the hypervisor-side VM has been created and initialized. A new helper, pkvm_hyp_vm_is_created(), is added to check this flag. All call sites that previously checked for the handle's existence are converted to use the new, explicit check. The 'is_created' flag is set to true upon successful creation in the hypervisor (EL2) and cleared upon destruction. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	070362648f	KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs The hypervisor code for protected KVM contains comments that are imprecise and at times flat-out wrong. They often refer to a "protected VM" in contexts where the code or data structure applies to _any_ VM managed by the hypervisor when pKVM is enabled. For instance, the 'vm_table' holds handles for all VMs known to the hypervisor, not exclusively for those that are configured as protected. This inaccurate terminology can make the code scope harder to understand for future (and current) developers. Clarify the comments throughout the pKVM hypervisor code to make a clear distinction between the pKVM feature itself (i.e., "protected mode") and the VMs that are specifically configured to be protected. This involves replacing ambiguous uses of "protected VM" with more accurate phrasing. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	58dfb66b1e	KVM: arm64: Rename pkvm.enabled to pkvm.is_protected The 'pkvm.enabled' field in struct kvm_protected_vm is confusingly named. Its purpose is to indicate whether a VM is a _protected_ VM under pKVM, and not whether the VM itself is enabled or running. For a non-protected VM, the VM can be fully active and running, yet this field would be false. This ambiguity can lead to incorrect assumptions about the VM's operational state and makes the code harder to reason about. Rename the field to 'is_protected' to make it unambiguous that the flag tracks the protected status of the VM. No functional change intended. Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Kunwu Chan <chentao@kylinos.cn> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	5db1bef933	KVM: arm64: Track SVE state in the hypervisor vcpu structure When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-04-28 09:23:46 +01:00
Oliver Upton	ca19dd4323	Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/next * kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:40 -07:00
Fuad Tabba	1eab115486	KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Instead of creating and initializing _all_ hyp vcpus in pKVM when the first host vcpu runs for the first time, initialize _each_ hyp vcpu in conjunction with its corresponding host vcpu. Some of the host vcpu state (e.g., system registers and traps values) is not initialized until the first time the host vcpu is run. Therefore, initializing a hyp vcpu before its corresponding host vcpu has run for the first time might not view the complete host state of these vcpus. Additionally, this behavior is inline with non-protected modes. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-14 16:06:03 -07:00
Fuad Tabba	066daa8d3b	KVM: arm64: Initialize HCRX_EL2 traps in pKVM Initialize and set the traps controlled by the HCRX_EL2 in pKVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-14 16:00:49 -07:00
Vincent Donnefort	8c0d7d14c5	KVM: arm64: Distinct pKVM teardown memcache for stage-2 In order to account for memory dedicated to the stage-2 page-tables, use a separated memcache when tearing down the VM. Meanwhile rename reclaim_guest_pages to reflect the fact it only reclaim page-table pages. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250313114038.1502357-3-vdonnefort@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-14 00:56:29 -07:00
Oliver Upton	03e1b89d05	KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM recently added a capability that allows userspace to override the 'implementation ID' registers presented to the VM. MIDR_EL1 is a special example, where the hypervisor can directly set the value when read from EL1 using VPIDR_EL2. Copy the VM-wide value for MIDR_EL1 into the hyp VM for non-protected guests when the capability is enabled so VPIDR_EL2 gets set up correctly. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/ac594b9c-4bbb-46c8-9391-e7a68ce4de5b@sirena.org.uk/ Fixes: `3adaee7830` ("KVM: arm64: Allow userspace to change the implementation ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-05 16:56:57 -08:00
Oliver Upton	9d91227364	KVM: arm64: Copy guest CTR_EL0 into hyp VM Since commit `2843cae266` ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") KVM has allowed userspace to configure the VM-wide view of CTR_EL0, falling back to trap-n-emulate if the value doesn't match hardware. It appears that this has worked by chance in protected-mode for some time, and on systems with FEAT_EVT protected-mode unconditionally sets TID4 (i.e. TID2 traps sans CTR_EL0). Forward the guest CTR_EL0 value through to the hyp VM and align the TID2/TID4 configuration with the non-protected setup. Fixes: `2843cae266` ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-05 16:55:41 -08:00
Marc Zyngier	e880b16efb	Merge branch kvm-arm64/pkvm-fixed-features-6.14 into kvmarm-master/next * kvm-arm64/pkvm-fixed-features-6.14: (24 commits) : . : Complete rework of the pKVM handling of features, catching up : with the rest of the code deals with it these days. : Patches courtesy of Fuad Tabba. From the cover letter: : : "This patch series uses the vm's feature id registers to track the : supported features, a framework similar to nested virt to set the : trap values, and removes the need to store cptr_el2 per vcpu in : favor of setting its value when traps are activated, as VHE mode : does." : : This branch drags the arm64/for-next/cpufeature branch to solve : ugly conflicts in -next. : . KVM: arm64: Fix FEAT_MTE in pKVM KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm KVM: arm64: Convert the SVE guest vcpu flag to a vm flag KVM: arm64: Remove PtrAuth guest vcpu flag KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE KVM: arm64: Refactor kvm_reset_cptr_el2() KVM: arm64: Calculate cptr_el2 traps on activating traps KVM: arm64: Remove redundant setting of HCR_EL2 trap bit KVM: arm64: Remove fixed_config.h header KVM: arm64: Rework specifying restricted features for protected VMs KVM: arm64: Set protected VM traps based on its view of feature registers KVM: arm64: Fix RAS trapping in pKVM for protected VMs KVM: arm64: Initialize feature id registers for protected VMs KVM: arm64: Use KVM extension checks for allowed protected VM capabilities KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM KVM: arm64: Move checking protected vcpu features to a separate function KVM: arm64: Group setting traps for protected VMs by control register KVM: arm64: Consolidate allowed and restricted VM feature checks arm64/sysreg: Get rid of CPACR_ELx SysregFields arm64/sysreg: Convert *_EL12 accessors to Mapping ... Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/fpsimd.c # arch/arm64/kvm/hyp/nvhe/pkvm.c	2025-01-12 10:40:10 +00:00
Vladimir Murzin	b7f345fbc3	KVM: arm64: Fix FEAT_MTE in pKVM Make sure we do not trap access to Allocation Tags. Fixes: `b56680de9c` ("KVM: arm64: Initialize trap register values in hyp in pKVM") Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250106112421.65355-1-vladimir.murzin@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-08 10:24:32 +00:00
Fuad Tabba	41d6028e28	KVM: arm64: Convert the SVE guest vcpu flag to a vm flag The vcpu flag GUEST_HAS_SVE is per-vcpu, but it is based on what is now a per-vm feature. Make the flag per-vm. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:54:09 +00:00
Fuad Tabba	c5c1763596	KVM: arm64: Remove PtrAuth guest vcpu flag The vcpu flag GUEST_HAS_PTRAUTH is always associated with the vcpu PtrAuth features, which are defined per vm rather than per vcpu. Remove the flag, and replace it with checks for the features instead. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-16-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:54:06 +00:00
Fuad Tabba	2fd5b4b0e7	KVM: arm64: Calculate cptr_el2 traps on activating traps Similar to VHE, calculate the value of cptr_el2 from scratch on activate traps. This removes the need to store cptr_el2 in every vcpu structure. Moreover, some traps, such as whether the guest owns the fp registers, need to be set on every vcpu run. Reported-by: James Clark <james.clark@linaro.org> Fixes: `5294afdbf4` ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:53:57 +00:00
Fuad Tabba	092e7b2c3b	KVM: arm64: Remove redundant setting of HCR_EL2 trap bit In hVHE mode, HCR_E2H should be set for both protected and non-protected VMs. Since commit `b56680de9c` ("KVM: arm64: Initialize trap register values in hyp in pKVM"), this has been fixed, and the setting of the flag here is redundant. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-12-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:53:55 +00:00
Fuad Tabba	81403c8d04	KVM: arm64: Remove fixed_config.h header The few remaining items needed in fixed_config.h are better suited for pkvm.h. Move them there and delete it. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-11-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:53:53 +00:00
Fuad Tabba	0401f7e76d	KVM: arm64: Set protected VM traps based on its view of feature registers Now that the VM's feature id registers are initialized with the values of the supported features, use those values to determine which traps to set using kvm_has_feature(). Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:53:00 +00:00
Fuad Tabba	9df9186f8d	KVM: arm64: Fix RAS trapping in pKVM for protected VMs Trap RAS in pKVM if not supported at all for protected VMs. The RAS version doesn't matter in this case. Fixes: `2a0c343386` ("KVM: arm64: Initialize trap registers for protected VMs") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:52:58 +00:00
Fuad Tabba	7ba5b8f804	KVM: arm64: Initialize feature id registers for protected VMs The hypervisor maintains the state of protected VMs. Initialize the values for feature ID registers for protected VMs, to be used when setting traps and when advertising features to protected VMs. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:52:50 +00:00
Fuad Tabba	a3163dca48	KVM: arm64: Use KVM extension checks for allowed protected VM capabilities Use KVM extension checks as the source for determining which capabilities are allowed for protected VMs. KVM extension checks is the natural place for this, since it is also the interface exposed to users. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:45:25 +00:00
Fuad Tabba	27f5cf8ad5	KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM The hypervisor is responsible for the power state of protected VMs in pKVM. Therefore, remove KVM_ARM_VCPU_POWER_OFF from the list of allowed features for protected VMs. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:45:25 +00:00
Fuad Tabba	1fea164ccf	KVM: arm64: Move checking protected vcpu features to a separate function At the moment, checks for supported vcpu features for protected VMs are build-time bugs. In the following patch, they will become runtime checks based on the vcpu's features registers. Therefore, consolidate them into one function that would return an error if it encounters an unsupported feature. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:45:25 +00:00

1 2

91 Commits