Commit Graph

91 Commits

Author SHA1 Message Date
Fuad Tabba
73b9c1e5da KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu()
Two bugs exist in the vCPU initialisation path:

1. If a check fails after hyp_pin_shared_mem() succeeds, the cleanup
   path jumps to 'unlock' without calling unpin_host_vcpu() or
   unpin_host_sve_state(), permanently leaking pin references on the
   host vCPU and SVE state pages.

   Extract a register_hyp_vcpu() helper that performs the checks and
   the store. When register_hyp_vcpu() returns an error, call
   unpin_host_vcpu() and unpin_host_sve_state() inline before falling
   through to the existing 'unlock' label.

2. register_hyp_vcpu() publishes the new vCPU pointer into
   'hyp_vm->vcpus[]' with a bare store, allowing a concurrent caller
   of pkvm_load_hyp_vcpu() to observe a partially initialised vCPU
   object.

   Ensure the store uses smp_store_release() and the load uses
   smp_load_acquire(). While 'vm_table_lock' currently serialises the
   store and the load, these barriers ensure the reader sees the fully
   initialised 'hyp_vcpu' object even if there were a lockless path or
   if the lock's own ordering guarantees were insufficient for nested
   object initialization.

Fixes: 49af6ddb8e5c ("KVM: arm64: Add infrastructure to create and track pKVM instances at EL2")
Reported-by: Ben Simner <ben.simner@cl.cam.ac.uk>
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
2026-04-24 12:03:57 +01:00
Will Deacon
bc20692f52 KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim
Now that the teardown of a VM cannot be finalised as long as a reference
is held on the VM, rework __pkvm_reclaim_dying_guest_page() to hold a
reference to the dying VM rather than take the global 'vm_table_lock'
during the reclaim operation.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-4-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-01 14:29:06 +01:00
Will Deacon
2400696883 KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM
Now that completion of the teardown path requires a refcount of zero for
the target VM, we can allow get_pkvm_hyp_vm() to take a reference on a
dying VM, which is necessary to unshare pages with a non-protected VM
during the teardown process itself.

Note that vCPUs belonging to a dying VM cannot be loaded and pages can
only be reclaimed from a protected VM (via
__pkvm_reclaim_dying_guest_page()) if the target VM is in the dying
state.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-3-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-01 14:29:06 +01:00
Will Deacon
760299a1d8 KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm'
Destroying a 'hyp_vm' with an elevated referenced count in
__pkvm_finalize_teardown_vm() is only going to lead to tears.

In preparation for allowing limited references to be acquired on dying
VMs during the teardown process, factor out the handle-to-vm logic for
the teardown path and reuse it for both the 'start' and 'finalise'
stages of the teardown process.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-2-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-01 14:29:06 +01:00
Will Deacon
8972a99160 KVM: arm64: Register 'selftest_vm' in the VM table
In preparation for extending the pKVM page ownership selftests to cover
forceful reclaim of donated pages, rework the creation of the
'selftest_vm' so that it is registered in the VM table while the tests
are running.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-35-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:09 +01:00
Will Deacon
246c976c37 KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
protected VMs to unshare memory that was previously shared with the host
using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-31-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:09 +01:00
Will Deacon
03313efed5 KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected
VMs to share memory (e.g. the swiotlb bounce buffers) back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-30-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:09 +01:00
Will Deacon
94c5250515 KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
Add a hypercall handler at EL2 for hypercalls originating from protected
VMs. For now, this implements only the FEATURES and MEMINFO calls, but
subsequent patches will implement the SHARE and UNSHARE functions
necessary for virtio.

Unhandled hypercalls (including PSCI) are passed back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-29-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:09 +01:00
Will Deacon
56080f53a6 KVM: arm64: Introduce hypercall to force reclaim of a protected page
Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
the host to forcefully reclaim a physical page that was previous donated
to a protected guest. This results in the page being zeroed and the
previous guest mapping being poisoned so that new pages cannot be
subsequently donated at the same IPA.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-26-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:09 +01:00
Will Deacon
44887977ab KVM: arm64: Change 'pkvm_handle_t' to u16
'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches
will rely on it being no more than 16 bits so that it can be encoded
into a pte annotation.

Change 'pkvm_handle_t' to a u16 and add a compile-type check that the
maximum handle fits into the reduced type.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-24-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:08 +01:00
Will Deacon
0bf5f4d400 KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.

Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-16-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:08 +01:00
Will Deacon
6c58f914eb KVM: arm64: Split teardown hypercall into two phases
In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.

The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more.  Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-12-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30 16:58:07 +01:00
Fuad Tabba
02471a78a0 KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a
hypervisor virtual address during vCPU initialization in
`pkvm_vcpu_init_sve()`.

`unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since
`kern_hyp_va()` is idempotent, it's not a bug. However, it is
unnecessary and potentially confusing. Remove the redundant conversion.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-13 14:54:48 +00:00
Fuad Tabba
7e7c2cf002 KVM: arm64: Fix ID register initialization for non-protected pKVM guests
In protected mode, the hypervisor maintains a separate instance of
the `kvm` structure for each VM. For non-protected VMs, this structure is
initialized from the host's `kvm` state.

Currently, `pkvm_init_features_from_host()` copies the
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the
underlying `id_regs` data being initialized. This results in the
hypervisor seeing the flag as set while the ID registers remain zeroed.

Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for
non-protected VMs. This breaks logic that relies on feature detection,
such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain
system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not
saved/restored during the world switch, which could lead to state
corruption.

Fix this by explicitly copying the ID registers from the host `kvm` to
the hypervisor `kvm` for non-protected VMs during initialization, since
we trust the host with its non-protected guests' features. Also ensure
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in
`pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly
initialize them and set the flag once done.

Fixes: 41d6028e28 ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-13 14:54:48 +00:00
Marc Zyngier
183ac2b2ad Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/next
* kvm-arm64/pkvm-no-mte:
  : .
  : pKVM updates preventing the host from using MTE-related system
  : sysrem registers when the feature is disabled from the kernel
  : command-line (arm64.nomte), courtesy of Fuad Taba.
  :
  : From the cover letter:
  :
  : "If MTE is supported by the hardware (and is enabled at EL3), it remains
  : available to lower exception levels by default. Disabling it in the host
  : kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
  : the feature; it does not physically disable MTE in the hardware.
  :
  : The ability to disable MTE in the host kernel is used by some systems,
  : such as Android, so that the physical memory otherwise used as tag
  : storage can be used for other things (i.e. treated just like the rest of
  : memory). In this scenario, a malicious host could still access tags in
  : pages donated to a guest using MTE instructions (e.g., STG and LDG),
  : bypassing the kernel's configuration."
  : .
  KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
  KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
  KVM: arm64: Trap MTE access and discovery when MTE is disabled
  KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:16:31 +00:00
Fuad Tabba
230b080623 KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
When initializing HCR traps in protected mode, use kvm_has_mte() to
check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1,
MTE, IMP).

kvm_has_mte() provides a more comprehensive check:
 - kvm_has_feat() only checks if MTE is in the guest's ID register view
   (i.e., what we advertise to the guest)
 - kvm_has_mte() checks both system_supports_mte() AND whether
   KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23 11:28:48 +00:00
Marc Zyngier
c983b3e276 Merge branch kvm-arm64/pkvm-features-6.20 into kvmarm-master/next
* kvm-arm64/pkvm-features-6.20:
  : .
  : pKVM guest feature trapping fixes, courtesy of Fuad Tabba.
  : .
  KVM: arm64: Prevent host from managing timer offsets for protected VMs
  KVM: arm64: Check whether a VM IOCTL is allowed in pKVM
  KVM: arm64: Track KVM IOCTLs and their associated KVM caps
  KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM
  KVM: arm64: Include VM type when checking VM capabilities in pKVM
  KVM: arm64: Introduce helper to calculate fault IPA offset
  KVM: arm64: Fix MTE flag initialization for protected VMs
  KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
  KVM: arm64: Fix Trace Buffer trapping for protected VMs

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23 10:04:47 +00:00
Fuad Tabba
43a21a0f0c KVM: arm64: Include VM type when checking VM capabilities in pKVM
Certain features and capabilities are restricted in protected mode. Most
of these features are restricted only for protected VMs, but some
are restricted for ALL VMs in protected mode.

Extend the pKVM capability check to pass the VM (kvm), and use that when
determining supported features.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 15:43:15 +00:00
Fuad Tabba
ebbcaece84 KVM: arm64: Fix MTE flag initialization for protected VMs
The function pkvm_init_features_from_host() initializes guest
features, propagating them from the host. The logic to propagate
KVM_ARCH_FLAG_MTE_ENABLED (Memory Tagging Extension)
has a couple of issues.

First, the check was in the common path, before the divergence for
protected and non-protected VMs. For non-protected VMs, this was
unnecessary, as 'kvm->arch.flags' is completely overwritten by
host_arch_flags immediately after, which already contains the MTE flag.
For protected VMs, this was setting the flag even if the feature is not
allowed.

Second, the check was reading 'host_kvm->arch.flags' instead of using
the local 'host_arch_flags', which is read once from the host flags.

Fix these by moving the MTE flag check inside the protected-VM-only
path, checking if the feature is allowed, and changing it to use the
correct host_arch_flags local variable. This ensures non-protected VMs
get the flag via the bulk copy, and protected VMs get it via an explicit
check.

Fixes: b7f345fbc3 ("KVM: arm64: Fix FEAT_MTE in pKVM")
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 15:43:15 +00:00
Fuad Tabba
e913c7ce9e KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
The E2TB bits in MDCR_EL2 control trapping of Trace Buffer system
register accesses. These accesses are trapped to EL2 when the bits are
clear.

The trap initialization logic for protected VMs in pvm_init_traps_mdcr()
had the polarity inverted. When a guest did not support the Trace Buffer
feature, the code was setting E2TB. This incorrectly disabled the trap,
potentially allowing a protected guest to access registers for a feature
it was not given.

Fix this by inverting the operation.

Fixes: f50758260b ("KVM: arm64: Group setting traps for protected VMs by control register")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 15:42:18 +00:00
Fuad Tabba
288eb55483 KVM: arm64: Fix Trace Buffer trapping for protected VMs
For protected VMs in pKVM, the hypervisor should trap accesses to trace
buffer system registers if Trace Buffer isn't supported by the VM.
However, the current code only traps if Trace Buffer External Mode isn't
supported.

Fix this by checking for FEAT_TRBE (Trace Buffer) rather than
FEAT_TRBE_EXT.

Fixes: 9d52612690 ("KVM: arm64: Trap external trace for protected VMs")
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 15:42:18 +00:00
Alexandru Elisei
145cc42fe1 KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load
Commit fb10ddf35c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()")
introduced per-VCPU FGT traps. For an unprotected pKVM VCPU, the untrusted
host FGT configuration is copied in pkvm_vcpu_init_traps(), which is called
from __pkvm_init_vcpu(). __pkvm_init_vcpu() is called once per VCPU (when
the VCPU is first run) which means that the uninitialized, zero, values for
the FGT registers end up being used for the entire lifetime of the VCPU.
This causes both unwanted traps (for the inverse polarity trap bits) and
the guest being allowed to access registers it shouldn't.

Fix it by copying the FGT traps for unprotected pKVM VCPUs when the
untrusted host loads the VCPU.

Fixes: fb10ddf35c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()")
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251216103053.47224-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08 12:56:17 -08:00
Marc Zyngier
567ebfedb5 KVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode
As we are about to start trapping a bunch of extra things, augment
the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC,
making them legal instead of resulting in a UNDEF injection in the guest.

While we're at it, ensure that pKVM captures the vgic model so that it
can be checked by the emulation code.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Oliver Upton
fb10ddf35c KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
To date KVM has used the fine-grained traps for the sake of UNDEF
enforcement (so-called FGUs), meaning the constituent parts could be
computed on a per-VM basis and folded into the effective value when
programmed.

Prepare for traps changing based on the vCPU context by computing the
whole mess of them at vcpu_load(). Aggressively inline all the helpers
to preserve the build-time checks that were there before.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:44:37 +01:00
Fuad Tabba
256b4668cd KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization
The existing __pkvm_init_vm hypercall performs both the reservation of a
VM table entry and the initialization of the hypervisor VM state in a
single operation. This design prevents the host from obtaining a VM
handle from the hypervisor until all preparation for the creation and
the initialization of the VM is done, which is on the first vCPU run
operation.

To support more flexible VM lifecycle management, the host needs the
ability to reserve a handle early, before the first vCPU run.

Refactor the hypercall interface to enable this, splitting the single
hypercall into a two-stage process:

- __pkvm_reserve_vm: A new hypercall that allocates a slot in the
  hypervisor's vm_table, marks it as reserved, and returns a unique
  handle to the host.

- __pkvm_unreserve_vm: A corresponding cleanup hypercall to safely
  release the reservation if the host fails to proceed with full
  initialization.

- __pkvm_init_vm: The existing hypercall is modified to no longer
  allocate a slot. It now expects a pre-reserved handle and commits the
  donated VM memory to that slot.

For now, the host-side code in __pkvm_create_hyp_vm calls the new
reserve and init hypercalls back-to-back to maintain existing behavior.
This paves the way for subsequent patches to separate the reservation
and initialization steps in the VM's lifecycle.

Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
814fd6beac KVM: arm64: Consolidate pKVM hypervisor VM initialization logic
The insert_vm_table_entry() function was performing tasks beyond its
primary responsibility. In addition to inserting a VM pointer into the
vm_table, it was also initializing several fields within 'struct
pkvm_hyp_vm', such as the VMID and stage-2 MMU pointers. This mixing of
concerns made the code harder to follow.

As another preparatory step towards allowing a VM table entry to be
reserved before the VM is fully created, this logic must be cleaned up.
By separating table insertion from state initialization, we can control
the timing of the initialization step more precisely in subsequent
patches.

Refactor the code to consolidate all initialization logic into
init_pkvm_hyp_vm():

- Move the initialization of the handle, VMID, and MMU fields from
  insert_vm_table_entry() to init_pkvm_hyp_vm().

- Simplify insert_vm_table_entry() to perform only one action: placing
  the provided pkvm_hyp_vm pointer into the vm_table.

- Update the calling sequence in __pkvm_init_vm() to first allocate an
  entry in the VM table, initialize the VM, and then insert the VM into
  the VM table. This is all protected by the vm_table_lock for now.
  Subsequent patches will adjust the sequence and not hold the
  vm_table_lock while initializing the VM at the hypervisor
  (init_pkvm_hyp_vm()).

Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
1abc1ad529 KVM: arm64: Separate allocation and insertion of pKVM VM table entries
The current insert_vm_table_entry() function performs two actions at
once: it finds a free slot in the pKVM VM table and populates it with
the pkvm_hyp_vm pointer.

Refactor this function as a preparatory step for future work that will
require reserving a VM slot and its corresponding handle earlier in the
VM lifecycle, before the pkvm_hyp_vm structure is initialized and ready
to be inserted.

Split the function into a two-phase process:

- A new allocate_vm_table_entry() function finds an empty slot, marks it
  as reserved with a RESERVED_ENTRY placeholder, and returns a handle
  derived from the slot's index.

- The insert_vm_table_entry() function is repurposed to take the handle,
  validate that the corresponding slot is in the reserved state, and
  then populate it with the pkvm_hyp_vm pointer.

Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
3c45b67625 KVM: arm64: Decouple hyp VM creation state from its handle
Currently, the presence of a pKVM handle (pkvm.handle != 0) is used to
determine if the corresponding hypervisor (EL2) VM has been created and
initialized. This couples the handle's lifecycle with the VM's creation
state.

This coupling will become problematic with upcoming changes that will
allocate the pKVM handle earlier in the VM's life, before the VM is
instantiated at the hypervisor.

To prepare for this and make the state tracking explicit, decouple the
two concepts. Introduce a new boolean flag, 'pkvm.is_created', to track
whether the hypervisor-side VM has been created and initialized.

A new helper, pkvm_hyp_vm_is_created(), is added to check this flag. All
call sites that previously checked for the handle's existence are
converted to use the new, explicit check. The 'is_created' flag is set
to true upon successful creation in the hypervisor (EL2) and cleared
upon destruction.

Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
070362648f KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs
The hypervisor code for protected KVM contains comments that are
imprecise and at times flat-out wrong. They often refer to a "protected
VM" in contexts where the code or data structure applies to _any_ VM
managed by the hypervisor when pKVM is enabled.

For instance, the 'vm_table' holds handles for all VMs known to the
hypervisor, not exclusively for those that are configured as protected.
This inaccurate terminology can make the code scope harder to understand
for future (and current) developers.

Clarify the comments throughout the pKVM hypervisor code to make a clear
distinction between the pKVM feature itself (i.e., "protected mode") and
the VMs that are specifically configured to be protected. This involves
replacing ambiguous uses of "protected VM" with more accurate phrasing.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
58dfb66b1e KVM: arm64: Rename pkvm.enabled to pkvm.is_protected
The 'pkvm.enabled' field in struct kvm_protected_vm is confusingly
named. Its purpose is to indicate whether a VM is a _protected_ VM under
pKVM, and not whether the VM itself is enabled or running.

For a non-protected VM, the VM can be fully active and running, yet this
field would be false. This ambiguity can lead to incorrect assumptions
about the VM's operational state and makes the code harder to reason
about.

Rename the field to 'is_protected' to make it unambiguous that the flag
tracks the protected status of the VM.

No functional change intended.

Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-15 10:46:55 +01:00
Fuad Tabba
5db1bef933 KVM: arm64: Track SVE state in the hypervisor vcpu structure
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).

Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28 09:23:46 +01:00
Oliver Upton
ca19dd4323 Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/next
* kvm-arm64/pkvm-6.15:
  : pKVM updates for 6.15
  :
  :  - SecPageTable stats for stage-2 table pages allocated by the protected
  :    hypervisor (Vincent Donnefort)
  :
  :  - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba)
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats
  KVM: arm64: Distinct pKVM teardown memcache for stage-2
  KVM: arm64: Add flags to kvm_hyp_memcache

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:54:40 -07:00
Fuad Tabba
1eab115486 KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
Instead of creating and initializing _all_ hyp vcpus in pKVM when
the first host vcpu runs for the first time, initialize _each_
hyp vcpu in conjunction with its corresponding host vcpu.

Some of the host vcpu state (e.g., system registers and traps
values) is not initialized until the first time the host vcpu is
run. Therefore, initializing a hyp vcpu before its corresponding
host vcpu has run for the first time might not view the complete
host state of these vcpus.

Additionally, this behavior is inline with non-protected modes.

Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14 16:06:03 -07:00
Fuad Tabba
066daa8d3b KVM: arm64: Initialize HCRX_EL2 traps in pKVM
Initialize and set the traps controlled by the HCRX_EL2 in pKVM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250314111832.4137161-3-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14 16:00:49 -07:00
Vincent Donnefort
8c0d7d14c5 KVM: arm64: Distinct pKVM teardown memcache for stage-2
In order to account for memory dedicated to the stage-2 page-tables, use
a separated memcache when tearing down the VM. Meanwhile rename
reclaim_guest_pages to reflect the fact it only reclaim page-table
pages.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250313114038.1502357-3-vdonnefort@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14 00:56:29 -07:00
Oliver Upton
03e1b89d05 KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM recently added a capability that allows userspace to override the
'implementation ID' registers presented to the VM. MIDR_EL1 is a special
example, where the hypervisor can directly set the value when read from
EL1 using VPIDR_EL2.

Copy the VM-wide value for MIDR_EL1 into the hyp VM for non-protected
guests when the capability is enabled so VPIDR_EL2 gets set up
correctly.

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/kvmarm/ac594b9c-4bbb-46c8-9391-e7a68ce4de5b@sirena.org.uk/
Fixes: 3adaee7830 ("KVM: arm64: Allow userspace to change the implementation ID registers")
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305230825.484091-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05 16:56:57 -08:00
Oliver Upton
9d91227364 KVM: arm64: Copy guest CTR_EL0 into hyp VM
Since commit 2843cae266 ("KVM: arm64: Treat CTR_EL0 as a VM feature
ID register") KVM has allowed userspace to configure the VM-wide view of
CTR_EL0, falling back to trap-n-emulate if the value doesn't match
hardware. It appears that this has worked by chance in protected-mode
for some time, and on systems with FEAT_EVT protected-mode
unconditionally sets TID4 (i.e. TID2 traps sans CTR_EL0).

Forward the guest CTR_EL0 value through to the hyp VM and align the
TID2/TID4 configuration with the non-protected setup.

Fixes: 2843cae266 ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register")
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305230825.484091-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05 16:55:41 -08:00
Marc Zyngier
e880b16efb Merge branch kvm-arm64/pkvm-fixed-features-6.14 into kvmarm-master/next
* kvm-arm64/pkvm-fixed-features-6.14: (24 commits)
  : .
  : Complete rework of the pKVM handling of features, catching up
  : with the rest of the code deals with it these days.
  : Patches courtesy of Fuad Tabba. From the cover letter:
  :
  : "This patch series uses the vm's feature id registers to track the
  : supported features, a framework similar to nested virt to set the
  : trap values, and removes the need to store cptr_el2 per vcpu in
  : favor of setting its value when traps are activated, as VHE mode
  : does."
  :
  : This branch drags the arm64/for-next/cpufeature branch to solve
  : ugly conflicts in -next.
  : .
  KVM: arm64: Fix FEAT_MTE in pKVM
  KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
  KVM: arm64: Convert the SVE guest vcpu flag to a vm flag
  KVM: arm64: Remove PtrAuth guest vcpu flag
  KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE
  KVM: arm64: Refactor kvm_reset_cptr_el2()
  KVM: arm64: Calculate cptr_el2 traps on activating traps
  KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
  KVM: arm64: Remove fixed_config.h header
  KVM: arm64: Rework specifying restricted features for protected VMs
  KVM: arm64: Set protected VM traps based on its view of feature registers
  KVM: arm64: Fix RAS trapping in pKVM for protected VMs
  KVM: arm64: Initialize feature id registers for protected VMs
  KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
  KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
  KVM: arm64: Move checking protected vcpu features to a separate function
  KVM: arm64: Group setting traps for protected VMs by control register
  KVM: arm64: Consolidate allowed and restricted VM feature checks
  arm64/sysreg: Get rid of CPACR_ELx SysregFields
  arm64/sysreg: Convert *_EL12 accessors to Mapping
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

# Conflicts:
#	arch/arm64/kvm/fpsimd.c
#	arch/arm64/kvm/hyp/nvhe/pkvm.c
2025-01-12 10:40:10 +00:00
Vladimir Murzin
b7f345fbc3 KVM: arm64: Fix FEAT_MTE in pKVM
Make sure we do not trap access to Allocation Tags.

Fixes: b56680de9c ("KVM: arm64: Initialize trap register values in hyp in pKVM")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250106112421.65355-1-vladimir.murzin@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08 10:24:32 +00:00
Fuad Tabba
41d6028e28 KVM: arm64: Convert the SVE guest vcpu flag to a vm flag
The vcpu flag GUEST_HAS_SVE is per-vcpu, but it is based on what
is now a per-vm feature. Make the flag per-vm.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-17-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:54:09 +00:00
Fuad Tabba
c5c1763596 KVM: arm64: Remove PtrAuth guest vcpu flag
The vcpu flag GUEST_HAS_PTRAUTH is always associated with the
vcpu PtrAuth features, which are defined per vm rather than per
vcpu.

Remove the flag, and replace it with checks for the features
instead.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-16-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:54:06 +00:00
Fuad Tabba
2fd5b4b0e7 KVM: arm64: Calculate cptr_el2 traps on activating traps
Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.

Reported-by: James Clark <james.clark@linaro.org>
Fixes: 5294afdbf4 ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:53:57 +00:00
Fuad Tabba
092e7b2c3b KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
In hVHE mode, HCR_E2H should be set for both protected and
non-protected VMs. Since commit b56680de9c ("KVM: arm64:
Initialize trap register values in hyp in pKVM"), this has been
fixed, and the setting of the flag here is redundant.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-12-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:53:55 +00:00
Fuad Tabba
81403c8d04 KVM: arm64: Remove fixed_config.h header
The few remaining items needed in fixed_config.h are better
suited for pkvm.h. Move them there and delete it.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-11-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:53:53 +00:00
Fuad Tabba
0401f7e76d KVM: arm64: Set protected VM traps based on its view of feature registers
Now that the VM's feature id registers are initialized with the
values of the supported features, use those values to determine
which traps to set using kvm_has_feature().

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:53:00 +00:00
Fuad Tabba
9df9186f8d KVM: arm64: Fix RAS trapping in pKVM for protected VMs
Trap RAS in pKVM if not supported at all for protected VMs. The
RAS version doesn't matter in this case.

Fixes: 2a0c343386 ("KVM: arm64: Initialize trap registers for protected VMs")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:52:58 +00:00
Fuad Tabba
7ba5b8f804 KVM: arm64: Initialize feature id registers for protected VMs
The hypervisor maintains the state of protected VMs. Initialize
the values for feature ID registers for protected VMs, to be used
when setting traps and when advertising features to protected
VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:52:50 +00:00
Fuad Tabba
a3163dca48 KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
Use KVM extension checks as the source for determining which
capabilities are allowed for protected VMs. KVM extension checks
is the natural place for this, since it is also the interface
exposed to users.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:45:25 +00:00
Fuad Tabba
27f5cf8ad5 KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
The hypervisor is responsible for the power state of protected
VMs in pKVM. Therefore, remove KVM_ARM_VCPU_POWER_OFF from the
list of allowed features for protected VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:45:25 +00:00
Fuad Tabba
1fea164ccf KVM: arm64: Move checking protected vcpu features to a separate function
At the moment, checks for supported vcpu features for protected
VMs are build-time bugs. In the following patch, they will become
runtime checks based on the vcpu's features registers. Therefore,
consolidate them into one function that would return an error if
it encounters an unsupported feature.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:45:25 +00:00