KVM x86 fixes for 6.16-rcN

- Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
    being created so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM.
 
  - Use a pre-allocated, per-vCPU buffer for handling de-sparsified vCPU masks
    when emulating Hyper-V hypercalls to fix a "stack frame too large" issue.
 
  - Allow out-of-range/invalid Xen event channel ports when configuring IRQ
    routing to avoid dictating a specific ioctl() ordering to userspace.
 
  - Conditionally reschedule when setting memory attributes to avoid soft
    lockups when userspace converts huge swaths of memory to/from private.
 
  - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest.
 
  - Add a missing field in struct sev_data_snp_launch_start that resulted in
    the guest-visible workarounds field being filled at the wrong offset.
 
  - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid
    VM-Fail on INVVPID.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmhdyHAACgkQOlYIJqCj
 N/17aw/9FwD8hhN9SL/g7Pj6MhcMlKXWZVkat+kJmVPR01vYsp5VRqiAs8zZEhtq
 dX6E1Vxssc9jz8Ga+hCI1Dc8NWQnfaiCeQkdv+AtjHgmVdFvP4HcLHf7D2GUNwC3
 YGIjOLxmYEhdw66zLKnirnPo3GUuoOnG7XicfB8RgM0W6c7QQ/MbU5E0WLSSKGmG
 z/B8wTM9OpZMyxwyKMK3vKY90sBdKwbdCUhuPoH9MJpsqkPgmltfw3/TssGy47ox
 ZKKTPjI43UJmMVsB77u4NoLr0owke8twPG+GM6X0Jgrcw4O1dfpEa8yeTbZSgr0t
 +Se0hFD0diW70gfGovpKouLZE+RSR8Mv88anCmDdRlV76RqHUhaTakNez53FfJyT
 qAyAjzrbl+fV6nRFYzXLPZMbHUlpcsclVgF14KDAMdpXfagRqUhfQnIJr1eD7bRS
 /rU6xNSF/lYpGx4KiSr6OE7hLeZfC8sQtZcIb62CJT+9yAKNX2yZ6x4xQCqGO18R
 /+k+CgzMR0ru4KCR4JmkoCn6LGIDB7ZE+dqvsjoxfDKwlombXhz29yClPNWVi7WM
 IjXwgPTKjZVhTmvXHlX7sfASbEy3farKfk99aXlb/MPbAUXtVF9Ruw8cfsrayduy
 n7dRp6BTB9YedeLNkMZwN4B3RemLf8vUq3W/2cKS8Ei7PXWDxzw=
 =BSws
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.16-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.16-rcN

- Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
  being created so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM.

- Use a pre-allocated, per-vCPU buffer for handling de-sparsified vCPU masks
  when emulating Hyper-V hypercalls to fix a "stack frame too large" issue.

- Allow out-of-range/invalid Xen event channel ports when configuring IRQ
  routing to avoid dictating a specific ioctl() ordering to userspace.

- Conditionally reschedule when setting memory attributes to avoid soft
  lockups when userspace converts huge swaths of memory to/from private.

- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest.

- Add a missing field in struct sev_data_snp_launch_start that resulted in
  the guest-visible workarounds field being filled at the wrong offset.

- Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid
  VM-Fail on INVVPID.

- Advertise supported TDX TDVMCALLs to userspace.
This commit is contained in:
Paolo Bonzini 2025-07-08 10:49:19 -04:00
commit 5383fc057a
13 changed files with 116 additions and 22 deletions

View File

@ -7196,6 +7196,10 @@ The valid value for 'flags' is:
u64 leaf;
u64 r11, r12, r13, r14;
} get_tdvmcall_info;
struct {
u64 ret;
u64 vector;
} setup_event_notify;
};
} tdx;
@ -7210,21 +7214,24 @@ number from register R11. The remaining field of the union provide the
inputs and outputs of the TDVMCALL. Currently the following values of
``nr`` are defined:
* ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote
signed by a service hosting TD-Quoting Enclave operating on the host.
Parameters and return value are in the ``get_quote`` field of the union.
The ``gpa`` field and ``size`` specify the guest physical address
(without the shared bit set) and the size of a shared-memory buffer, in
which the TDX guest passes a TD Report. The ``ret`` field represents
the return value of the GetQuote request. When the request has been
queued successfully, the TDX guest can poll the status field in the
shared-memory area to check whether the Quote generation is completed or
not. When completed, the generated Quote is returned via the same buffer.
* ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote
signed by a service hosting TD-Quoting Enclave operating on the host.
Parameters and return value are in the ``get_quote`` field of the union.
The ``gpa`` field and ``size`` specify the guest physical address
(without the shared bit set) and the size of a shared-memory buffer, in
which the TDX guest passes a TD Report. The ``ret`` field represents
the return value of the GetQuote request. When the request has been
queued successfully, the TDX guest can poll the status field in the
shared-memory area to check whether the Quote generation is completed or
not. When completed, the generated Quote is returned via the same buffer.
* ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support
status of TDVMCALLs. The output values for the given leaf should be
placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info``
field of the union.
* ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support
status of TDVMCALLs. The output values for the given leaf should be
placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info``
field of the union.
* ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to
set up a notification interrupt for vector ``vector``.
KVM may add support for more values in the future that may cause a userspace
exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case,

View File

@ -79,7 +79,20 @@ to be configured to the TDX guest.
struct kvm_tdx_capabilities {
__u64 supported_attrs;
__u64 supported_xfam;
__u64 reserved[254];
/* TDG.VP.VMCALL hypercalls executed in kernel and forwarded to
* userspace, respectively
*/
__u64 kernel_tdvmcallinfo_1_r11;
__u64 user_tdvmcallinfo_1_r11;
/* TDG.VP.VMCALL instruction executions subfunctions executed in kernel
* and forwarded to userspace, respectively
*/
__u64 kernel_tdvmcallinfo_1_r12;
__u64 user_tdvmcallinfo_1_r12;
__u64 reserved[250];
/* Configurable CPUID bits for userspace */
struct kvm_cpuid2 cpuid;

View File

@ -700,8 +700,13 @@ struct kvm_vcpu_hv {
struct kvm_vcpu_hv_tlb_flush_fifo tlb_flush_fifo[HV_NR_TLB_FLUSH_FIFOS];
/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
/*
* Preallocated buffers for handling hypercalls that pass sparse vCPU
* sets (for high vCPU counts, they're too large to comfortably fit on
* the stack).
*/
u64 sparse_banks[HV_MAX_SPARSE_VCPU_BANKS];
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
struct hv_vp_assist_page vp_assist_page;

View File

@ -72,6 +72,7 @@
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
#define TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT 0x10004ULL
/*
* TDG.VP.VMCALL Status Codes (returned in R10)

View File

@ -965,7 +965,13 @@ struct kvm_tdx_cmd {
struct kvm_tdx_capabilities {
__u64 supported_attrs;
__u64 supported_xfam;
__u64 reserved[254];
__u64 kernel_tdvmcallinfo_1_r11;
__u64 user_tdvmcallinfo_1_r11;
__u64 kernel_tdvmcallinfo_1_r12;
__u64 user_tdvmcallinfo_1_r12;
__u64 reserved[250];
/* Configurable CPUID bits for userspace */
struct kvm_cpuid2 cpuid;

View File

@ -1979,6 +1979,9 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY)
goto out_flush_all;
if (is_noncanonical_invlpg_address(entries[i], vcpu))
continue;
/*
* Lower 12 bits of 'address' encode the number of additional
* pages to flush.
@ -2001,11 +2004,11 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
unsigned long *vcpu_mask = hv_vcpu->vcpu_mask;
u64 *sparse_banks = hv_vcpu->sparse_banks;
struct kvm *kvm = vcpu->kvm;
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
/*
* Normally, there can be no more than 'KVM_HV_TLB_FLUSH_FIFO_SIZE'

View File

@ -1971,6 +1971,10 @@ static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src)
struct kvm_vcpu *src_vcpu;
unsigned long i;
if (src->created_vcpus != atomic_read(&src->online_vcpus) ||
dst->created_vcpus != atomic_read(&dst->online_vcpus))
return -EBUSY;
if (!sev_es_guest(src))
return 0;
@ -4445,8 +4449,12 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
if (svm->sev_es.vmsa && !svm->sev_es.snp_has_guest_vmsa)
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
if (!svm->sev_es.snp_has_guest_vmsa) {
if (svm->sev_es.vmsa)
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
else
svm->vmcb->control.vmsa_pa = INVALID_PAGE;
}
if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES))
svm->vmcb->control.allowed_sev_features = sev->vmsa_features |

View File

@ -173,6 +173,9 @@ static void td_init_cpuid_entry2(struct kvm_cpuid_entry2 *entry, unsigned char i
tdx_clear_unsupported_cpuid(entry);
}
#define TDVMCALLINFO_GET_QUOTE BIT(0)
#define TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT BIT(1)
static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf,
struct kvm_tdx_capabilities *caps)
{
@ -188,6 +191,10 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf,
caps->cpuid.nent = td_conf->num_cpuid_config;
caps->user_tdvmcallinfo_1_r11 =
TDVMCALLINFO_GET_QUOTE |
TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT;
for (i = 0; i < td_conf->num_cpuid_config; i++)
td_init_cpuid_entry2(&caps->cpuid.entries[i], i);
@ -1530,6 +1537,27 @@ static int tdx_get_quote(struct kvm_vcpu *vcpu)
return 0;
}
static int tdx_setup_event_notify_interrupt(struct kvm_vcpu *vcpu)
{
struct vcpu_tdx *tdx = to_tdx(vcpu);
u64 vector = tdx->vp_enter_args.r12;
if (vector < 32 || vector > 255) {
tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
return 1;
}
vcpu->run->exit_reason = KVM_EXIT_TDX;
vcpu->run->tdx.flags = 0;
vcpu->run->tdx.nr = TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT;
vcpu->run->tdx.setup_event_notify.ret = TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED;
vcpu->run->tdx.setup_event_notify.vector = vector;
vcpu->arch.complete_userspace_io = tdx_complete_simple;
return 0;
}
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
switch (tdvmcall_leaf(vcpu)) {
@ -1541,6 +1569,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
return tdx_get_td_vm_call_info(vcpu);
case TDVMCALL_GET_QUOTE:
return tdx_get_quote(vcpu);
case TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT:
return tdx_setup_event_notify_interrupt(vcpu);
default:
break;
}

View File

@ -1971,8 +1971,19 @@ int kvm_xen_setup_evtchn(struct kvm *kvm,
{
struct kvm_vcpu *vcpu;
if (ue->u.xen_evtchn.port >= max_evtchn_port(kvm))
return -EINVAL;
/*
* Don't check for the port being within range of max_evtchn_port().
* Userspace can configure what ever targets it likes; events just won't
* be delivered if/while the target is invalid, just like userspace can
* configure MSIs which target non-existent APICs.
*
* This allow on Live Migration and Live Update, the IRQ routing table
* can be restored *independently* of other things like creating vCPUs,
* without imposing an ordering dependency on userspace. In this
* particular case, the problematic ordering would be with setting the
* Xen 'long mode' flag, which changes max_evtchn_port() to allow 4096
* instead of 1024 event channels.
*/
/* We only support 2 level event channels for now */
if (ue->u.xen_evtchn.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL)

View File

@ -594,6 +594,7 @@ struct sev_data_snp_addr {
* @imi_en: launch flow is launching an IMI (Incoming Migration Image) for the
* purpose of guest-assisted migration.
* @rsvd: reserved
* @desired_tsc_khz: hypervisor desired mean TSC freq in kHz of the guest
* @gosvw: guest OS-visible workarounds, as defined by hypervisor
*/
struct sev_data_snp_launch_start {
@ -603,6 +604,7 @@ struct sev_data_snp_launch_start {
u32 ma_en:1; /* In */
u32 imi_en:1; /* In */
u32 rsvd:30;
u32 desired_tsc_khz; /* In */
u8 gosvw[16]; /* In */
} __packed;

View File

@ -467,6 +467,10 @@ struct kvm_run {
__u64 leaf;
__u64 r11, r12, r13, r14;
} get_tdvmcall_info;
struct {
__u64 ret;
__u64 vector;
} setup_event_notify;
};
} tdx;
/* Fix the size of the union. */

View File

@ -74,6 +74,7 @@ int main(int argc, char *argv[])
int testcase;
char test[80];
TEST_REQUIRE(this_cpu_has(X86_FEATURE_MWAIT));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_DISABLE_QUIRKS2));
ksft_print_header();

View File

@ -2572,6 +2572,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
if (r)
goto out_unlock;
cond_resched();
}
kvm_handle_gfn_range(kvm, &pre_set_range);
@ -2580,6 +2582,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
GFP_KERNEL_ACCOUNT));
KVM_BUG_ON(r, kvm);
cond_resched();
}
kvm_handle_gfn_range(kvm, &post_set_range);