linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-03 20:14:06 +02:00

Author	SHA1	Message	Date
Sascha Bischoff	d51c978b7d	KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI GICv5 systems will likely not support the full set of PPIs. The presence of any virtual PPI is tied to the presence of the physical PPI. Therefore, the available PPIs will be limited by the physical host. Userspace cannot drive any PPIs that are not implemented. Moreover, it is not desirable to expose all PPIs to the guest in the first place, even if they are supported in hardware. Some devices, such as the arch timer, are implemented in KVM, and hence those PPIs shouldn't be driven by userspace, either. Provided a new UAPI: KVM_DEV_ARM_VGIC_GRP_CTRL => KVM_DEV_ARM_VGIC_USERPSPACE_PPIs This allows userspace to query which PPIs it is able to drive via KVM_IRQ_LINE. Additionally, introduce a check in kvm_vm_ioctl_irq_line() to reject any PPIs not in the userspace mask. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-40-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:29 +00:00
Sascha Bischoff	9491c63b6c	KVM: arm64: gic-v5: Enlighten arch timer for GICv5 Now that GICv5 has arrived, the arch timer requires some TLC to address some of the key differences introduced with GICv5. For PPIs on GICv5, the queue_irq_unlock irq_op is used as AP lists are not required at all for GICv5. The arch timer also introduces an irq_op - get_input_level. Extend the arch-timer-provided irq_ops to include the PPI op for vgic_v5 guests. When possible, DVI (Direct Virtual Interrupt) is set for PPIs when using a vgic_v5, which directly inject the pending state into the guest. This means that the host never sees the interrupt for the guest for these interrupts. This has three impacts. * First of all, the kvm_cpu_has_pending_timer check is updated to explicitly check if the timers are expected to fire. * Secondly, for mapped timers (which use DVI) they must be masked on the host prior to entering a GICv5 guest, and unmasked on the return path. This is handled in set_timer_irq_phys_masked. * Thirdly, it makes zero sense to attempt to inject state for a DVI'd interrupt. Track which timers are direct, and skip the call to kvm_vgic_inject_irq() for these. The final, but rather important, change is that the architected PPIs for the timers are made mandatory for a GICv5 guest. Attempts to set them to anything else are actively rejected. Once a vgic_v5 is initialised, the arch timer PPIs are also explicitly reinitialised to ensure the correct GICv5-compatible PPIs are used - this also adds in the GICv5 PPI type to the intid. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-32-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	f4d37c7c35	KVM: arm64: gic-v5: Create and initialise vgic_v5 Update kvm_vgic_create to create a vgic_v5 device. When creating a vgic, FEAT_GCIE in the ID_AA64PFR2 is only exposed to vgic_v5-based guests, and is hidden otherwise. GIC in ~ID_AA64PFR0_EL1 is never exposed for a vgic_v5 guest. When initialising a vgic_v5, skip kvm_vgic_dist_init as GICv5 doesn't support one. The current vgic_v5 implementation only supports PPIs, so no SPIs are initialised either. The current vgic_v5 support doesn't extend to nested guests. Therefore, the init of vgic_v5 for a nested guest is failed in vgic_v5_init. As the current vgic_v5 doesn't require any resources to be mapped, vgic_v5_map_resources is simply used to check that the vgic has indeed been initialised. Again, this will change as more GICv5 support is merged in. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-29-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	4a5444d239	KVM: arm64: Introduce set_direct_injection irq_op GICv5 adds support for directly injected PPIs. The mechanism for setting this up is GICv5 specific, so rather than adding GICv5-specific code to the common vgic code, we introduce a new irq_op. This new irq_op is intended to be used to enable or disable direct injection for interrupts that support it. As it is an irq_op, it has no effect unless explicitly populated in the irq_ops structure for a particular interrupt. The usage is demonstracted in the subsequent change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-26-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	4a9a32d353	KVM: arm64: gic: Introduce queue_irq_unlock to irq_ops There are times when the default behaviour of vgic_queue_irq_unlock() is undesirable. This is because some GICs, such a GICv5 which is the main driver for this change, handle the majority of the interrupt lifecycle in hardware. In this case, there is no need for a per-VCPU AP list as the interrupt can be made pending directly. This is done either via the ICH_PPI_x_EL2 registers for PPIs, or with the VDPEND system instruction for SPIs and LPIs. The vgic_queue_irq_unlock() function is made overridable using a new function pointer in struct irq_ops. vgic_queue_irq_unlock() is overridden if the function pointer is non-null. This new irq_op is unused in this change - it is purely providing the infrastructure itself. The subsequent PPI injection changes provide a demonstration of the usage of the queue_irq_unlock irq_op. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-20-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	8f1fbe2fd2	KVM: arm64: gic-v5: Finalize GICv5 PPIs and generate mask We only want to expose a subset of the PPIs to a guest. If a PPI does not have an owner, it is not being actively driven by a device. The SW_PPI is a special case, as it is likely for userspace to wish to inject that. Therefore, just prior to running the guest for the first time, we need to finalize the PPIs. A mask is generated which, when combined with trapping a guest's PPI accesses, allows for the guest's view of the PPI to be filtered. This mask is global to the VM as all VCPUs PPI configurations must match. In addition, the PPI HMR is calculated. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-19-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	9b8e3d4ca0	KVM: arm64: gic-v5: Implement GICv5 load/put and save/restore This change introduces GICv5 load/put. Additionally, it plumbs in save/restore for: * PPIs (ICH_PPI_x_EL2 regs) * ICH_VMCR_EL2 * ICH_APR_EL2 * ICC_ICSR_EL1 A GICv5-specific enable bit is added to struct vgic_vmcr as this differs from previous GICs. On GICv5-native systems, the VMCR only contains the enable bit (driven by the guest via ICC_CR0_EL1.EN) and the priority mask (PCR). A struct gicv5_vpe is also introduced. This currently only contains a single field - bool resident - which is used to track if a VPE is currently running or not, and is used to avoid a case of double load or double put on the WFI path for a vCPU. This struct will be extended as additional GICv5 support is merged, specifically for VPE doorbells. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-18-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	af325e87af	KVM: arm64: gic-v5: Add vgic-v5 save/restore hyp interface Introduce the following hyp functions to save/restore GICv5 state: * __vgic_v5_save_apr() * __vgic_v5_restore_vmcr_apr() * __vgic_v5_save_ppi_state() - no hypercall required * __vgic_v5_restore_ppi_state() - no hypercall required * __vgic_v5_save_state() - no hypercall required * __vgic_v5_restore_state() - no hypercall required Note that the functions tagged as not requiring hypercalls are always called directly from the same context. They are either called via the vgic_save_state()/vgic_restore_state() path when running with VHE, or via __hyp_vgic_save_state()/__hyp_vgic_restore_state() otherwise. This mimics how vgic_v3_save_state()/vgic_v3_restore_state() are implemented. Overall, the state of the following registers is saved/restored: * ICC_ICSR_EL1 * ICH_APR_EL2 * ICH_PPI_ACTIVERx_EL2 * ICH_PPI_DVIRx_EL2 * ICH_PPI_ENABLERx_EL2 * ICH_PPI_PENDRx_EL2 * ICH_PPI_PRIORITYRx_EL2 * ICH_VMCR_EL2 All of these are saved/restored to/from the KVM vgic_v5 CPUIF shadow state, with the exception of the PPI active, pending, and enable state. The pending state is saved and restored from kvm_host_data as any changes here need to be tracked and propagated back to the vgic_irq shadow structures (coming in a future commit). Therefore, an entry and an exit copy is required. The active and enable state is restored from the vgic_v5 CPUIF, but is saved to kvm_host_data. Again, this needs to by synced back into the shadow data structures. The ICSR must be save/restored as this register is shared between host and guest. Therefore, to avoid leaking host state to the guest, this must be saved and restored. Moreover, as this can by used by the host at any time, it must be save/restored eagerly. Note: the host state is not preserved as the host should only use this register when preemption is disabled. As with GICv3, the VMCR is eagerly saved as this is required when checking if interrupts can be injected or not, and therefore impacts things such as WFI. As part of restoring the ICH_VMCR_EL2 and ICH_APR_EL2, GICv3-compat mode is also disabled by setting the ICH_VCTLR_EL2.V3 bit to 0. The correspoinding GICv3-compat mode enable is part of the VMCR & APR restore for a GICv3 guest as it only takes effect when actually running a guest. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-17-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Sascha Bischoff	a258a383b9	KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE Add in a sanitization function for ID_AA64PFR2_EL1, preserving the already-present behaviour for the FPMR, MTEFAR, and MTESTOREONLY fields. Add sanitisation for the GCIE field, which is set to IMP if the host supports a GICv5 guest and NI, otherwise. Extend the sanitisation that takes place in kvm_vgic_create() to zero the ID_AA64PFR2.GCIE field when a non-GICv5 GIC is created. More importantly, move this sanitisation to a separate function, kvm_vgic_finalize_sysregs(), and call it from kvm_finalize_sys_regs(). We are required to finalize the GIC and GCIE fields a second time in kvm_finalize_sys_regs() due to how QEMU blindly reads out then verbatim restores the system register state. This avoids the issue where both the GCIE and GIC features are marked as present (an architecturally invalid combination), and hence guests fall over. See the comment in kvm_finalize_sys_regs() for more details. Overall, the following happens: * Before an irqchip is created, FEAT_GCIE is presented if the host supports GICv5-based guests. * Once an irqchip is created, all other supported irqchips are hidden from the guest; system register state reflects the guest's irqchip. * Userspace is allowed to set invalid irqchip feature combinations in the system registers, but... * ...invalid combinations are removed a second time prior to the first run of the guest, and things hopefully just work. All of this extra work is required to make sure that "legacy" GICv3 guests based on QEMU transparently work on compatible GICv5 hosts without modification. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-13-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	f656807150	KVM: arm64: gic-v5: Detect implemented PPIs on boot As part of booting the system and initialising KVM, create and populate a mask of the implemented PPIs. This mask allows future PPI operations (such as save/restore or state, or syncing back into the shadow state) to only consider PPIs that are actually implemented on the host. The set of implemented virtual PPIs matches the set of implemented physical PPIs for a GICv5 host. Therefore, this mask represents all PPIs that could ever by used by a GICv5-based guest on a specific host, albeit pre-filtered by what we support in KVM (see next paragraph). Only architected PPIs are currently supported in KVM with GICv5. Moreover, as KVM only supports a subset of all possible PPIS (Timers, PMU, GICv5 SW_PPI) the PPI mask only includes these PPIs, if present. The timers are always assumed to be present; if we have KVM we have EL2, which means that we have the EL1 & EL2 Timer PPIs. If we have a PMU (v3), then the PMUIRQ is present. The GICv5 SW_PPI is always assumed to be present. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-12-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	eb8bce08ec	KVM: arm64: gic: Introduce interrupt type helpers GICv5 has moved from using interrupt ranges for different interrupt types to using some of the upper bits of the interrupt ID to denote the interrupt type. This is not compatible with older GICs (which rely on ranges of interrupts to determine the type), and hence a set of helpers is introduced. These helpers take a struct kvm*, and use the vgic model to determine how to interpret the interrupt ID. Helpers are introduced for PPIs, SPIs, and LPIs. Additionally, a helper is introduced to determine if an interrupt is private - SGIs and PPIs for older GICs, and PPIs only for GICv5. Additionally, vgic_is_v5() is introduced (which unsurpisingly returns true when running a GICv5 guest), and the existing vgic_is_v3() check is moved from vgic.h to arm_vgic.h (to live alongside the vgic_is_v5() one), and has been converted into a macro. The helpers are plumbed into the core vgic code, as well as the Arch Timer and PMU code. There should be no functional changes as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-10-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Sascha Bischoff	663594aafb	KVM: arm64: vgic: Split out mapping IRQs and setting irq_ops Prior to this change, the act of mapping a virtual IRQ to a physical one also set the irq_ops. Unmapping then reset the irq_ops to NULL. So far, this has been fine and hasn't caused any major issues. Now, however, as GICv5 support is being added to KVM, it has become apparent that conflating mapping/unmapping IRQs and setting/clearing irq_ops can cause issues. The reason is that the upcoming GICv5 support introduces a set of default irq_ops for PPIs, and removing this when unmapping will cause things to break rather horribly. Split out the mapping/unmapping of IRQs from the setting/clearing of irq_ops. The arch timer code is updated to set the irq_ops following a successful map. The irq_ops are intentionally not removed again on an unmap as the only irq_op introduced by the arch timer only takes effect if the hw bit in struct vgic_irq is set. Therefore, it is safe to leave this in place, and it avoids additional complexity when GICv5 support is introduced. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-6-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Fuad Tabba	fb21cb0856	KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs The current implementation uses `vgic_state_iter` in `struct vgic_dist` to track the sequence position. This effectively makes the iterator shared across all open file descriptors for the VM. This approach has significant drawbacks: - It enforces mutual exclusion, preventing concurrent reads of the debugfs file (returning -EBUSY). - It relies on storing transient iterator state in the long-lived VM structure (`vgic_dist`). Refactor the implementation to use the standard `seq_file` iterator. Instead of storing state in `kvm_arch`, rely on the `pos` argument passed to the `start` and `next` callbacks, which tracks the logical index specific to the file descriptor. This change enables concurrent access and eliminates the `vgic_state_iter` field from `struct vgic_dist`. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202085721.3954942-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-02 10:59:25 +00:00
Fuad Tabba	5ab2496970	KVM: arm64: Reimplement vgic-debug XArray iteration The vgic-debug interface implementation uses XArray marks (`LPI_XA_MARK_DEBUG_ITER`) to "snapshot" LPIs at the start of iteration. This modifies global state for a read-only operation and complicates reference counting, leading to leaks if iteration is aborted or fails. Reimplement the iterator to use dynamic iteration logic: - Remove `lpi_idx` from `struct vgic_state_iter`. - Replace the XArray marking mechanism with dynamic iteration using `xa_find_after(..., XA_PRESENT)`. - Wrap XArray traversals in `rcu_read_lock()`/`rcu_read_unlock()` to ensure safety against concurrent modifications (e.g., LPI unmapping). - Handle potential races where an LPI is removed during iteration by gracefully skipping it in `show()`, rather than warning. - Remove the unused `LPI_XA_MARK_DEBUG_ITER` definition. This simplifies the lifecycle management of the iterator and prevents resource leaks associated with the marking mechanism, and paves the way for using a standard seq_file iterator. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202085721.3954942-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-02 10:59:25 +00:00
Marc Zyngier	255de897e7	KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps Add the plumbing of GICv2 interrupt deactivation via GICV_DIR. This requires adding a new device so that we can easily decode the DIR address. The deactivation itself is very similar to the GICv3 version. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-39-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:14 -08:00
Marc Zyngier	1c3b3cadcd	KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation SPIs are specially annpying, as they can be activated on a CPU and deactivated on another. WHich means that when an SPI is in flight anywhere, all CPUs need to have their TDIR trap bit set. This translates into broadcasting an IPI across all CPUs to make sure they set their trap bit, The number of in-flight SPIs is kept in an atomic variable so that CPUs can turn the trap bit off as soon as possible. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-32-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:14 -08:00
Marc Zyngier	cd4f6ee99b	KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps Deactivation via ICV_DIR_EL1 is both relatively straightforward (we have the interrupt that needs deactivation) and really awkward. The main issue is that the interrupt may either be in an LR on another CPU, or ourside of any LR. In the former case, we process the deactivation is if ot was a write to GICD_CACTIVERn, which is already implemented as a big hammer IPI'ing all vcpus. In the latter case, we just perform a normal deactivation, similar to what we do for EOImode==0. Another annoying aspect is that we need to tell the CPU owning the interrupt that its ap_list needs laudering. We use a brand new vcpu request to that effect. Note that this doesn't address deactivation via the GICV MMIO view, which will be taken care of in a later change. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:13 -08:00
Marc Zyngier	879a7fd4fd	KVM: arm64: Add tracking of vgic_irq being present in a LR We currently cannot identify whether an interrupt is queued into a LR. It wasn't needed until now, but that's about to change. Add yet another flag to track that state. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-9-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:12 -08:00
Marc Zyngier	a4413a7c31	KVM: arm64: Repack struct vgic_irq fields struct vgic_irq has grown over the years, in a rather bad way. Repack it using bitfields so that the individual flags, and move things around a bit so that it a bit smaller. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-8-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:11 -08:00
Marc Zyngier	fa8f11e8e1	irqchip/gic: Expose CPU interface VA to KVM Future changes will require KVM to be able to perform deactivations by writing to the physical CPU interface. Add the corresponding VA to the kvm_info structure, and let KVM stash it. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:11 -08:00
Paolo Bonzini	924ebaefce	KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmjVhSsACgkQI9DQutE9 ekOSIg//YdbA/zo17GrLnJnlGdUnS0e3/357n3e5Lypx1UFRDTmNacpVPw4VG/jt eVpQn7AgYwyvKfCq46eD+hBBqNv1XTn4DeXttv7CmVqhCRythsEvDkTBWSt7oYUZ xfYXCMKNhqUElH4AbYYx3y7nb2E9/KVGr+NBn6Vf5c14OZ3MGVc/fyp4jM1ih5dR kcV2onAYohlIGvFEyZMBtJ+jkYkqfIbfxfqCL0RAET0aEBFcmM1aXybWZj47hlLM f2j+E6cFQ0ZzUt+3pFhT75wo43lHGtIFDjVd60uishyU+NXTVvqRmXDTRU4k546W 18HHX1yijbzuXIatVhVRo2hIq3jKU37T9wtj46BejbDHRdAPENEyN/Qopm7rNS+X mCwOT7He6KR+H4rU6nFaTcsS7bNRCvIbZP9i9zb6NElbvXu5QnM8BUQsYFCDUa/n xtbtQlckbo/7zeoUsBDrGj2XmCf0d45FTHb7fdWOYEmMSmJhXYpUKdM4JcLyhKoQ DD0ox2S+pt2lwNw3XOSABdES0KJxCvDASAMIgn2h2sGpY8FxsBcVW/BufXopdafG UeInxWaILp2iCDM4tH2GLjKqlvMAOwcA+mAEZToXypxlJAnYA6J1pXCF8WEaM6+D BGrLli8Zd8JRs87byq6K7tp8oLNzZdliJp73j5jfOHTJA4MnvFI= =iAL3 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.	2025-09-30 13:23:28 -04:00
Marc Zyngier	d9476fd356	Merge branch kvm-arm64/gic-v5-nv into kvmarm-master/next * kvm-arm64/gic-v5-nv: : . : Add NV support to GICv5 in GICv3 emulation mode, ensuring that the v3 : guest support is identical to that of a pure v3 platform. : : Patches courtesy of Sascha Bischoff (20250828105925.3865158-1-sascha.bischoff@arm.com) : . irqchip/gic-v5: Drop has_gcie_v3_compat from gic_kvm_info KVM: arm64: Use ARM64_HAS_GICV5_LEGACY for GICv5 probing arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability KVM: arm64: Enable nested for GICv5 host with FEAT_GCIE_LEGACY KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:05 +01:00
Marc Zyngier	9664d5810e	KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility We currently access ICC_SRE_EL2 at each load/put on VHE, and on each entry/exit on nVHE. Both are quite onerous on NV, as this register always traps. We do this to make sure the EL1 guest doesn't flip between v2 and v3 behind our back. But all modern implementations have dropped v2, and this is just overhead. At the same time, the GICv5 spec has been fixed to allow access to ICC_SRE_EL2 in legacy mode. Use this opportunity to replace the GICv5 checks for v2 compat checks, with an ad-hoc static key. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-17 17:40:42 +01:00
Keir Fraser	8810c6e7cc	KVM: arm64: vgic-init: Remove vgic_ready() macro It is now used only within kvm_vgic_map_resources(). vgic_dist::ready is already written directly by this function, so it is clearer to bypass the macro for reads as well. Signed-off-by: Keir Fraser <keirf@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:55:22 +01:00
Oliver Upton	d54594accf	KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks syzkaller has caught us red-handed once more, this time nesting regular spinlocks behind raw spinlocks: ============================= [ BUG: Invalid wait context ] 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted ----------------------------- syz.0.29/3743 is trying to lock: a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 other info that might help us debug this: context-{5:5} 3 locks held by syz.0.29/3743: #0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499 #1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500 #2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150 stack backtrace: CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline] check_wait_context kernel/locking/lockdep.c:4905 [inline] __lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162 vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158 __kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455 kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505 kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244 kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline] kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344 kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367 __fput+0x4ac/0x980 fs/file_table.c:465 ____fput+0x20/0x58 fs/file_table.c:493 task_work_run+0x1bc/0x254 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 This is of course no good, but is at odds with how LPI refcounts are managed. Solve the locking mess by deferring the release of unreferenced LPIs after the ap_list_lock is released. Mark these to-be-released LPIs specially to avoid racing with vgic_put_irq() and causing a double-free. Since references can only be taken on LPIs with a nonzero refcount, extending the lifetime of freed LPIs is still safe. Reviewed-by: Marc Zyngier <maz@kernel.org> Reported-by: syzbot+cef594105ac7e60c6d93@syzkaller.appspotmail.com Closes: https://lore.kernel.org/kvmarm/68acd0d9.a00a0220.33401d.048b.GAE@google.com/ Link: https://lore.kernel.org/r/20250905100531.282980-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:20 -07:00
Oliver Upton	3a08a6ca7c	KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs KVM's use of krefs to manage LPIs isn't adding much, especially considering vgic_irq_release() is a noop due to the lack of sufficient context. Switch to using a regular refcount in anticipation of adding a meaningful release concept for LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250905100531.282980-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:19 -07:00
Oliver Upton	7d6ca84aa9	KVM: arm64: vgic: Drop stale comment on IRQ active state While LPIs lack an active state, KVM unconditionally folds the active state from the LR into the vgic_irq struct meaning this field cannot be 'creatively' reused for something else. Drop the misleading comment to reflect this. Link: https://lore.kernel.org/r/20250905100531.282980-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:19 -07:00
Paolo Bonzini	314b40b3b6	KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes. -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk= =r+sp -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.	2025-07-29 12:27:40 -04:00
Oliver Upton	0d46e324c0	Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next * kvm-arm64/vgic-v4-ctl: : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta : : Allow userspace to decide if support for SGIs without an active state is : advertised to the guest, allowing VMs from GICv3-only hardware to be : migrated to to GICv4.1 capable machines. Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init KVM: arm64: selftests: Add test for nASSGIcap attribute KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling KVM: arm64: Disambiguate support for vSGIs v. vLPIs Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-28 08:11:38 -07:00
Raghavendra Rao Ananta	c652887a92	KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a VM supports the feature. Only allow changes prior to VGIC initialization as at that point vPEs need to be allocated for the VM. For convenience, bundle support for vLPIs and vSGIs behind this feature, allowing userspace to control vPE allocation for VMs in environments that may be constrained on vPE IDs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-26 08:45:52 -07:00
Sascha Bischoff	c017e49ed1	KVM: arm64: gic-v5: Support GICv3 compat Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a GICv5 host to run GICv3-based VMs. This change enables the VHE/nVHE/hVHE/protected modes, but does not support nested virtualization. A lazy-disable approach is taken for compat mode; it is enabled on the vgic_v3_load path but not disabled on the vgic_v3_put path. A non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling compat mode on the corresponding vgic_v5_load path. Currently, GICv5 is not supported, and hence compat mode is not disabled again once it is enabled, and this function is intentionally omitted from the code. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20250627100847.1022515-5-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 14:41:06 -07:00
Sean Christopherson	b33252b9d1	KVM: Don't WARN if updating IRQ bypass route fails Don't bother WARNing if updating an IRTE route fails now that vendor code provides much more precise WARNs. The generic WARN doesn't provide enough information to actually debug the problem, and has obviously done nothing to surface the myriad bugs in KVM x86's implementation. Drop all of the associated return code plumbing that existed just so that common KVM could WARN. Link: https://lore.kernel.org/r/20250611224604.313496-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:32 -07:00
Oliver Upton	05b9405f2f	KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() The virtual mapping and "GSI" routing of a particular vLPI is subject to change in response to the guest / userspace. This can be pretty annoying to deal with when KVM needs to track the physical state that's managed for vLPI direct injection. Make vgic_v4_unset_forwarding() resilient by using the host IRQ to resolve the vgic IRQ. Since this uses the LPI xarray directly, finding the ITS by doorbell address + grabbing it's its_lock is no longer necessary. Note that matching the right ITS / ITE is already handled in vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS emulation the virtual mapping that should remain stable for the lifetime of the vLPI mapping. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:11:29 +01:00
Marc Zyngier	201c8d40dd	KVM: arm64: nv: Add Maintenance Interrupt emulation Emulating the vGIC means emulating the dreaded Maintenance Interrupt. This is a two-pronged problem: - while running L2, getting an MI translates into an MI injected in the L1 based on the state of the HW. - while running L1, we must accurately reflect the state of the MI line, based on the in-memory state. The MI INTID is added to the distributor, as expected on any virtualisation-capable implementation, and further patches will allow its configuration. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:10 -08:00
Marc Zyngier	146a050f2d	KVM: arm64: nv: Nested GICv3 emulation When entering a nested VM, we set up the hypervisor control interface based on what the guest hypervisor has set. Especially, we investigate each list register written by the guest hypervisor whether HW bit is set. If so, we translate hw irq number from the guest's point of view to the real hardware irq number if there is a mapping. Co-developed-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> [Christoffer: Redesigned execution flow around vcpu load/put] Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: Rewritten to support GICv3 instead of GICv2, NV2 support] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:04 -08:00
Marc Zyngier	96c2f03311	KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses Wire the handling of all GICv3 EL2 registers, and provide emulation for all the non memory-backed registers (ICC_SRE_EL2, ICH_VTR_EL2, ICH_MISR_EL2, ICH_ELRSR_EL2, and ICH_EISR_EL2). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:55:10 -08:00
Marc Zyngier	e7619f2a2f	KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition VGIC_MAX_PRIVATE is a pretty useless definition, and is better replaced with VGIC_NR_PRIVATE_IRQS. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:21:08 -08:00
Marc Zyngier	e28157060c	Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next * kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-08 16:41:50 +01:00
Marc Zyngier	8540bd1b99	Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/next * kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-03 11:39:52 +01:00
Marc Zyngier	03b3d00a70	KVM: arm64: vgic: Allocate private interrupts on demand Private interrupts are currently part of the CPU interface structure that is part of each and every vcpu we create. Currently, we have 32 of them per vcpu, resulting in a per-vcpu array that is just shy of 4kB. On its own, that's no big deal, but it gets in the way of other things: - each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This requires memory that is physically contiguous. However, the EL2 code has no purpose looking at the interrupt structures and could do without them being mapped. - supporting features such as EPPIs, which extend the number of private interrupts past the 32 limit would make the array even larger, even for VMs that do not use the EPPI feature. Address these issues by moving the private interrupt array outside of the vcpu, and replace it with a simple pointer. We take this opportunity to make it obvious what gets initialised when, as that path was remarkably opaque, and tighten the locking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-03 11:33:50 +01:00
Marc Zyngier	948e1a53c2	KVM: arm64: Simplify vgic-v3 hypercalls Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-01 16:48:14 +01:00
Oliver Upton	481c9ee846	KVM: arm64: vgic-its: Get rid of the lpi_list_lock The last genuine use case for the lpi_list_lock was the global LPI translation cache, which has been removed in favor of a per-ITS xarray. Remove a layer from the locking puzzle by getting rid of it. vgic_add_lpi() still has a critical section that needs to protect against the insertion of other LPIs; change it to take the LPI xarray's xa_lock to retain this property. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-13-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:56 +01:00
Oliver Upton	ec39bbfd55	KVM: arm64: vgic-its: Rip out the global translation cache The MSI injection fast path has been transitioned away from the global translation cache. Rip it out. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-12-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:56 +01:00
Oliver Upton	8201d1028c	KVM: arm64: vgic-its: Maintain a translation cache per ITS Within the context of a single ITS, it is possible to use an xarray to cache the device ID & event ID translation to a particular irq descriptor. Take advantage of this to build a translation cache capable of fitting all valid translations for a given ITS. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-9-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	30a0ce9c49	KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list() The last user has been transitioned to walking the LPI xarray directly. Cut the wart off, and get rid of the now unneeded lpi_count while doing so. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-7-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	85d3ccc8b7	KVM: arm64: vgic-debug: Use an xarray mark for debug iterator The vgic debug iterator is the final user of vgic_copy_lpi_list(), but is a bit more complicated to transition to something else. Use a mark in the LPI xarray to record the indices 'known' to the debug iterator. Protect against the LPIs from being freed by associating an additional reference with the xarray mark. Rework iter_next() to let the xarray walk 'drive' the iteration after visiting all of the SGIs, PPIs, and SPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	a5c7f011cb	KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner Free the vgic_irq structs in an RCU-safe manner to allow reads of the LPI configuration data to happen in parallel with the release of LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	05f4d4f5d4	KVM: arm64: vgic: Use atomics to count LPIs Switch to using atomics for LPI accounting, allowing vgic_irq references to be dropped in parallel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	9880835af7	KVM: arm64: vgic: Get rid of the LPI linked-list All readers of LPI configuration have been transitioned to use the LPI xarray. Get rid of the linked-list altogether. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	1d6f83f60f	KVM: arm64: vgic: Store LPIs in an xarray Using a linked-list for LPIs is less than ideal as it of course requires iterative searches to find a particular entry. An xarray is a better data structure for this use case, as it provides faster searches and can still handle a potentially sparse range of INTID allocations. Start by storing LPIs in an xarray, punting usage of the xarray to a subsequent change. The observant among you will notice that we added yet another lock to the chain of locking order rules; document the ordering of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one day... Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:01 +00:00

1 2 3 4 5

204 Commits