- Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by
   making sure the config symbol is correctly spelled out in the code
 
 - Don't reset the AArch32 view of the PMU counters to zero when the
   guest is writing to them
 
 - Fix an assorted collection of memory leaks in the newly added tracing
   code
 
 - Fix the capping of ZCR_EL2 which could be used in an unsanitised way
   by an L2 guest
 
 x86:
 
 - Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
   is defined, as older libc versions may not provide it.
 
 - Include execinfo.h if and only if KVM selftests are building against glibc,
   and provide a test_dump_stack() for non-glibc builds.
 
 - Silence an annoying RCU splat on (even non-KVM-related) panics.  The splat
   is technically legit, but in practice not an issue.  To have a race, you
   would need to unload the KVM modules at exactly the time a panic happens;
   and speaking of incredibly rare races, taking the locks risks introducing
   a deadlock if the module unload code took the lock on a CPU that has been
   halted.  Which seems possibly more likely than the RCU grace period issue,
   so just shut it up.  This code used to be in KVM but is now outside it;
   but the x86 maintainers haven't picked it up, so here we are.
 
 - Rate-limit global clock updates once again (but without delayed work), as
   KVM was subtly relying on the old rate-limiting for NPT correction to guard
   against "update storms" when running without a master clock on systems with
   overcommitted CPUs.
 
 - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
   marking it dirty when emulating INVPCID.
 
 - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
   doesn't contain AVIC-tagged entries for the APIC base GPA.
 
 - The top 10 commits fix buffer overflow (and potential TOC/TOU) flaws in the
   page state change protocol for encrypted VMs.  AI models find it quite
   easily given it was reported three times, but aren't as good at writing
   a comprehensive fix.  There's more to clean up in the area, which will
   come in 7.2.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmoZ2qQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPPFQgAhDwSk+VVnn4vuerijZh6eo3Tz4EQ
 af0Ccng1uDuTuz9HzkF/ffR4z3tBMYtUhVtUiPu5xrUabzmIW7T0roNvsCwzVZor
 ekZt3Y8FgwSgF+nxbBQQXBPvv+tOHpoIhfbirftWE9tRRFivfK1Z1duRGwsv7Seb
 0eK+iB1huJLjXqIZQtSLEY44LSoQbDIt/StkkYFLUr10oOvTRCFiu2wPA2gZrK56
 KTVrCg7rtn135wh8TVA72u+pIszylIPFTQ1HbbzzBoQ8/Opp0olFL3q0HeAwkx6D
 q0EJiNMP0QD8NDC7Q8efAit4wI0pXE4Y6ScHQJTm3p+hB6KXc9o7LKbCmA==
 =6jit
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:

   - Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by
     making sure the config symbol is correctly spelled out in the code

   - Don't reset the AArch32 view of the PMU counters to zero when the
     guest is writing to them

   - Fix an assorted collection of memory leaks in the newly added
     tracing code

   - Fix the capping of ZCR_EL2 which could be used in an unsanitised
     way by an L2 guest

  x86:

   - Include the kernel's linux/mman.h in KVM selftests to ensure
     MADV_COLLAPSE is defined, as older libc versions may not provide
     it.

   - Include execinfo.h if and only if KVM selftests are building
     against glibc, and provide a test_dump_stack() for non-glibc
     builds.

   - Silence an annoying RCU splat on (even non-KVM-related) panics.

     The splat is technically legit, but in practice not an issue. To
     have a race, you would need to unload the KVM modules at exactly
     the time a panic happens; and speaking of incredibly rare races,
     taking the locks risks introducing a deadlock if the module unload
     code took the lock on a CPU that has been halted. Which seems
     possibly more likely than the RCU grace period issue, so just shut
     it up. This code used to be in KVM but is now outside it; but the
     x86 maintainers haven't picked it up, so here we are.

   - Rate-limit global clock updates once again (but without delayed
     work), as KVM was subtly relying on the old rate-limiting for NPT
     correction to guard against "update storms" when running without a
     master clock on systems with overcommitted CPUs.

   - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty"
     instead of marking it dirty when emulating INVPCID.

   - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the
     CPU TLB doesn't contain AVIC-tagged entries for the APIC base GPA.

   - The top 10 commits fix buffer overflow (and potential TOC/TOU)
     flaws in the page state change protocol for encrypted VMs. AI
     models find it quite easily given it was reported three times, but
     aren't as good at writing a comprehensive fix. There's more to
     clean up in the area, which will come in 7.2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: SEV: Use READ_ONCE() when reading entries/indices from PSC buffer
  KVM: SEV: Check PSC request indices against the actual size of the buffer
  KVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()
  KVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0
  KVM: SEV: Compute the correct max length of the in-GHCB scratch area
  KVM: SEV: Use the size of the PSC header as the minimum size for PSC requests
  KVM: SEV: Ignore Port I/O requests of length '0'
  KVM: SEV: Reject MMIO requests larger than 8 bytes with GHCB v2+
  KVM: SEV: Ignore MMIO requests of length '0'
  KVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in use
  KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
  KVM: arm64: Fix memory leak in hyp_trace_unload()
  KVM: arm64: Fix rollback in hyp_trace_buffer_share_hyp()
  KVM: arm64: Fix meta-page unsharing in pKVM hyp tracing
  KVM: arm64: PMU: Preserve AArch32 counter low bits
  KVM: SVM: Flush the current TLB when transitioning from xAVIC => x2AVIC
  KVM: x86: Fix ERAPS RAP clear on INVPCID single-context invalidation
  KVM: arm64: Fix CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC
  KVM: selftests: Guard execinfo.h inclusion for non-glibc builds
  KVM: x86: Rate-limit global clock updates on vCPU load
  ...
This commit is contained in:
Linus Torvalds 2026-05-29 13:47:55 -07:00
commit d0ee290071
22 changed files with 172 additions and 67 deletions

View File

@ -511,7 +511,6 @@ enum vcpu_sysreg {
ACTLR_EL2, /* Auxiliary Control Register (EL2) */
CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
HACR_EL2, /* Hypervisor Auxiliary Control Register */
ZCR_EL2, /* SVE Control Register (EL2) */
TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */
TCR_EL2, /* Translation Control Register (EL2) */
@ -543,6 +542,7 @@ enum vcpu_sysreg {
SCTLR2_EL2, /* System Control Register 2 (EL2) */
MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
ZCR_EL2, /* SVE Control Register (EL2) */
/* Any VNCR-capable reg goes after this point */
MARKER(__VNCR_START__),

View File

@ -462,11 +462,13 @@ static inline bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code)
static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
{
u64 zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
/*
* The vCPU's saved SVE state layout always matches the max VL of the
* vCPU. Start off with the max VL so we can load the SVE state.
*/
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
sve_cond_update_zcr_vq(zcr_el2, SYS_ZCR_EL2);
__sve_restore_state(vcpu_sve_pffr(vcpu),
&vcpu->arch.ctxt.fp_regs.fpsr,
true);
@ -476,8 +478,10 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
* nested guest, as the guest hypervisor could select a smaller VL. Slap
* that into hardware before wrapping up.
*/
if (is_nested_ctxt(vcpu))
sve_cond_update_zcr_vq(__vcpu_sys_reg(vcpu, ZCR_EL2), SYS_ZCR_EL2);
if (is_nested_ctxt(vcpu)) {
zcr_el2 = min(zcr_el2, __vcpu_sys_reg(vcpu, ZCR_EL2));
sve_cond_update_zcr_vq(zcr_el2, SYS_ZCR_EL2);
}
write_sysreg_el1(__vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)), SYS_ZCR);
}
@ -501,11 +505,11 @@ static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu)
return;
if (vcpu_has_sve(vcpu)) {
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
/* A guest hypervisor may restrict the effective max VL. */
if (is_nested_ctxt(vcpu))
zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2);
else
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
zcr_el2 = min(zcr_el2, __vcpu_sys_reg(vcpu, ZCR_EL2));
write_sysreg_el2(zcr_el2, SYS_ZCR);

View File

@ -120,7 +120,7 @@ SYM_FUNC_START(__hyp_do_panic)
mov x29, x0
#ifdef PKVM_DISABLE_STAGE2_ON_PANIC
#ifdef CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC
/* Ensure host stage-2 is disabled */
mrs x0, hcr_el2
bic x0, x0, #HCR_VM

View File

@ -189,7 +189,7 @@ static void hyp_trace_buffer_unshare_hyp(struct hyp_trace_buffer *trace_buffer,
if (cpu > last_cpu)
break;
__share_page(rb_desc->meta_va);
__unshare_page(rb_desc->meta_va);
for (p = 0; p < rb_desc->nr_page_va; p++)
__unshare_page(rb_desc->page_va[p]);
}
@ -212,14 +212,15 @@ static int hyp_trace_buffer_share_hyp(struct hyp_trace_buffer *trace_buffer)
}
if (ret) {
for (p--; p >= 0; p--)
while (--p >= 0)
__unshare_page(rb_desc->page_va[p]);
__unshare_page(rb_desc->meta_va);
break;
}
}
if (ret)
hyp_trace_buffer_unshare_hyp(trace_buffer, cpu--);
hyp_trace_buffer_unshare_hyp(trace_buffer, --cpu);
return ret;
}
@ -248,6 +249,7 @@ static struct trace_buffer_desc *hyp_trace_load(unsigned long size, void *priv)
goto err_free_desc;
trace_buffer->desc = desc;
trace_buffer->desc_size = desc_size;
ret = hyp_trace_buffer_alloc_bpages_backing(trace_buffer, size);
if (ret)
@ -297,6 +299,7 @@ static void hyp_trace_unload(struct trace_buffer_desc *desc, void *priv)
hyp_trace_buffer_free_bpages_backing(trace_buffer);
free_pages_exact(trace_buffer->desc, trace_buffer->desc_size);
trace_buffer->desc = NULL;
trace_buffer->desc_size = 0;
}
static int hyp_trace_enable_tracing(bool enable, void *priv)

View File

@ -1834,6 +1834,11 @@ int kvm_init_nv_sysregs(struct kvm_vcpu *vcpu)
resx.res1 = VNCR_EL2_RES1;
set_sysreg_masks(kvm, VNCR_EL2, resx);
/* ZCR_EL2 - bits 8:4 are RAZ/WI so treat them as RES0 */
resx.res0 = ZCR_ELx_RES0 | GENMASK_ULL(8, 4);
resx.res1 = ZCR_ELx_RES1;
set_sysreg_masks(kvm, ZCR_EL2, resx);
out:
for (enum vcpu_sysreg sr = __SANITISED_REG_START__; sr < NR_SYS_REGS; sr++)
__vcpu_rmw_sys_reg(vcpu, sr, |=, 0);

View File

@ -174,8 +174,8 @@ static void kvm_pmu_set_pmc_value(struct kvm_pmc *pmc, u64 val, bool force)
* action is to use PMCR.P, which will reset them to
* 0 (the only use of the 'force' parameter).
*/
val = __vcpu_sys_reg(vcpu, reg) & GENMASK(63, 32);
val |= lower_32_bits(val);
val = (__vcpu_sys_reg(vcpu, reg) & GENMASK(63, 32)) |
lower_32_bits(val);
}
__vcpu_assign_sys_reg(vcpu, reg, val);

View File

@ -2862,21 +2862,16 @@ static bool access_zcr_el2(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
unsigned int vq;
if (guest_hyp_sve_traps_enabled(vcpu)) {
kvm_inject_nested_sve_trap(vcpu);
return false;
}
if (!p->is_write) {
if (!p->is_write)
p->regval = __vcpu_sys_reg(vcpu, ZCR_EL2);
return true;
}
else
__vcpu_assign_sys_reg(vcpu, ZCR_EL2, p->regval);
vq = SYS_FIELD_GET(ZCR_ELx, LEN, p->regval) + 1;
vq = min(vq, vcpu_sve_max_vq(vcpu));
__vcpu_assign_sys_reg(vcpu, ZCR_EL2, vq - 1);
return true;
}

View File

@ -1504,6 +1504,7 @@ struct kvm_arch {
bool use_master_clock;
u64 master_kernel_ns;
u64 master_cycle_now;
struct ratelimit_state kvmclock_update_rs;
#ifdef CONFIG_KVM_HYPERV
struct kvm_hv hyperv;

View File

@ -206,6 +206,35 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
/*
* Flush the TLB when enabling (x2)AVIC and when transitioning between
* xAVIC and x2AVIC, as the CPU may have inserted a TLB entry for the
* "wrong" mapping.
*
* KVM uses a per-VM "scratch" page to back the APIC memslot, because
* KVM also uses per-VM page tables *and* maintains the page table (NPT
* or shadow page) mappings for said memslot even if one or more vCPUs
* have their local APIC hardware-disabled or are in x2APIC mode, i.e.
* even if one or more vCPUs' APIC MMIO BAR is effectively disabled.
*
* If xAVIC is fully enabled, hardware ignores the physical address in
* KVM's page tables, i.e. in the leaf SPTE for the APIC memslot, and
* instead redirects the access to the AVIC backing page, i.e. to the
* vCPU's virtual APIC page. If xAVIC is not enabled (APIC is either
* hardware-disabled or in x2APIC mode), then guest accesses will use
* the page table mapping verbatim, i.e. will access the per-VM scratch
* page, as normal memory.
*
* In both cases, the CPU is allowed to cache TLB entries for the APIC
* base GPA. So, KVM needs to flush the TLB when enabling xAVIC, as
* accesses need to be redirected to the virtual APIC page, but the TLB
* may contain entries pointing at the scratch page. KVM also needs to
* flush the TLB when enabling x2AVIC, as accesses need to go to the
* scratch page, but the TLB may contain entries tagged as xAVIC, i.e.
* entries pointing to the vCPU's virtual APIC page.
*/
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
/*
* Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
* accesses, while interrupt injection to a running vCPU can be
@ -219,12 +248,6 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
/* Disabling MSR intercept for x2APIC registers */
avic_set_x2apic_msr_interception(svm, false);
} else {
/*
* Flush the TLB, the guest may have inserted a non-APIC
* mapping into the TLB while AVIC was disabled.
*/
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
/* Enabling MSR intercept for x2APIC registers */
avic_set_x2apic_msr_interception(svm, true);
}

View File

@ -3662,23 +3662,26 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
}
#define GHCB_SCRATCH_AREA_LIMIT (16ULL * PAGE_SIZE)
static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 min_len)
{
struct vmcb_control_area *control = &svm->vmcb->control;
u64 ghcb_scratch_beg, ghcb_scratch_end;
u64 scratch_gpa_beg, scratch_gpa_end;
void *scratch_va;
if (WARN_ON_ONCE(!min_len))
goto e_scratch;
scratch_gpa_beg = svm->sev_es.sw_scratch;
if (!scratch_gpa_beg) {
pr_err("vmgexit: scratch gpa not provided\n");
goto e_scratch;
}
scratch_gpa_end = scratch_gpa_beg + len;
scratch_gpa_end = scratch_gpa_beg + min_len;
if (scratch_gpa_end < scratch_gpa_beg) {
pr_err("vmgexit: scratch length (%#llx) not valid for scratch address (%#llx)\n",
len, scratch_gpa_beg);
min_len, scratch_gpa_beg);
goto e_scratch;
}
@ -3702,21 +3705,27 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
scratch_va = (void *)svm->sev_es.ghcb;
scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
svm->sev_es.ghcb_sa_len = ghcb_scratch_end - scratch_gpa_beg;
} else {
/* GHCB v2 requires the scratch area to be within the GHCB. */
if (to_kvm_sev_info(svm->vcpu.kvm)->ghcb_version >= 2)
goto e_scratch;
/*
* The guest memory must be read into a kernel buffer, so
* limit the size
*/
if (len > GHCB_SCRATCH_AREA_LIMIT) {
if (min_len > GHCB_SCRATCH_AREA_LIMIT) {
pr_err("vmgexit: scratch area exceeds KVM limits (%#llx requested, %#llx limit)\n",
len, GHCB_SCRATCH_AREA_LIMIT);
min_len, GHCB_SCRATCH_AREA_LIMIT);
goto e_scratch;
}
scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);
scratch_va = kvzalloc(min_len, GFP_KERNEL_ACCOUNT);
if (!scratch_va)
return -ENOMEM;
if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, min_len)) {
/* Unable to copy scratch area from guest */
pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
@ -3732,11 +3741,10 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
*/
svm->sev_es.ghcb_sa_sync = sync;
svm->sev_es.ghcb_sa_free = true;
svm->sev_es.ghcb_sa_len = min_len;
}
svm->sev_es.ghcb_sa = scratch_va;
svm->sev_es.ghcb_sa_len = len;
return 0;
e_scratch:
@ -3833,7 +3841,7 @@ struct psc_buffer {
struct psc_entry entries[];
} __packed;
static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
static int snp_begin_psc(struct vcpu_svm *svm);
static void snp_complete_psc(struct vcpu_svm *svm, u64 psc_ret)
{
@ -3864,9 +3872,9 @@ static void __snp_complete_one_psc(struct vcpu_svm *svm)
*/
for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
svm->sev_es.psc_inflight--, idx++) {
struct psc_entry *entry = &entries[idx];
struct psc_entry entry = READ_ONCE(entries[idx]);
entry->cur_page = entry->pagesize ? 512 : 1;
entries[idx].cur_page = entry.pagesize ? 512 : 1;
}
hdr->cur_entry = idx;
@ -3875,7 +3883,6 @@ static void __snp_complete_one_psc(struct vcpu_svm *svm)
static int snp_complete_one_psc(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct psc_buffer *psc = svm->sev_es.ghcb_sa;
if (vcpu->run->hypercall.ret) {
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
@ -3885,16 +3892,18 @@ static int snp_complete_one_psc(struct kvm_vcpu *vcpu)
__snp_complete_one_psc(svm);
/* Handle the next range (if any). */
return snp_begin_psc(svm, psc);
return snp_begin_psc(svm);
}
static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
static int snp_begin_psc(struct vcpu_svm *svm)
{
struct vcpu_sev_es_state *sev_es = &svm->sev_es;
struct psc_buffer *psc = sev_es->ghcb_sa;
struct psc_entry *entries = psc->entries;
struct kvm_vcpu *vcpu = &svm->vcpu;
struct psc_hdr *hdr = &psc->hdr;
struct psc_entry entry_start;
u16 idx, idx_start, idx_end;
u16 idx, idx_start, idx_end, max_nr_entries;
int npages;
bool huge;
u64 gfn;
@ -3904,6 +3913,19 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
return 1;
}
/*
* GHCB v2 requires the scratch area to reside within the GHCB itself,
* and PSC requests are only supported for GHCB v2+. Thus it should be
* impossible to exceed the max PSC entry count (which is derived from
* the size of the shared GHCB buffer).
*/
max_nr_entries = (sev_es->ghcb_sa_len - sizeof(struct psc_hdr)) /
sizeof(struct psc_entry);
if (WARN_ON_ONCE(max_nr_entries > VMGEXIT_PSC_MAX_COUNT)) {
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
return 1;
}
next_range:
/* There should be no other PSCs in-flight at this point. */
if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
@ -3916,17 +3938,17 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
* validation, so take care to only use validated copies of values used
* for things like array indexing.
*/
idx_start = hdr->cur_entry;
idx_end = hdr->end_entry;
idx_start = READ_ONCE(hdr->cur_entry);
idx_end = READ_ONCE(hdr->end_entry);
if (idx_end >= VMGEXIT_PSC_MAX_COUNT) {
if (idx_end >= max_nr_entries) {
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_HDR);
return 1;
}
/* Find the start of the next range which needs processing. */
for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
entry_start = entries[idx];
entry_start = READ_ONCE(entries[idx]);
gfn = entry_start.gfn;
huge = entry_start.pagesize;
@ -3970,7 +3992,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
* KVM_HC_MAP_GPA_RANGE exit.
*/
while (++idx <= idx_end) {
struct psc_entry entry = entries[idx];
struct psc_entry entry = READ_ONCE(entries[idx]);
if (entry.operation != entry_start.operation ||
entry.gfn != entry_start.gfn + npages ||
@ -4493,13 +4515,22 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
case SVM_VMGEXIT_MMIO_READ:
case SVM_VMGEXIT_MMIO_WRITE: {
bool is_write = control->exit_code == SVM_VMGEXIT_MMIO_WRITE;
u64 len = control->exit_info_2;
ret = setup_vmgexit_scratch(svm, !is_write, control->exit_info_2);
if (!len)
return 1;
if (to_kvm_sev_info(vcpu->kvm)->ghcb_version >= 2 && len > 8) {
svm_vmgexit_bad_input(svm, GHCB_ERR_INVALID_INPUT);
return 1;
}
ret = setup_vmgexit_scratch(svm, !is_write, len);
if (ret)
break;
ret = kvm_sev_es_mmio(vcpu, is_write, control->exit_info_1,
control->exit_info_2, svm->sev_es.ghcb_sa);
ret = kvm_sev_es_mmio(vcpu, is_write, control->exit_info_1, len,
svm->sev_es.ghcb_sa);
break;
}
case SVM_VMGEXIT_NMI_COMPLETE:
@ -4546,11 +4577,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->system_event.data[0] = control->ghcb_gpa;
break;
case SVM_VMGEXIT_PSC:
ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
ret = setup_vmgexit_scratch(svm, true, sizeof(struct psc_hdr));
if (ret)
break;
ret = snp_begin_psc(svm, svm->sev_es.ghcb_sa);
ret = snp_begin_psc(svm);
break;
case SVM_VMGEXIT_AP_CREATION:
ret = sev_snp_ap_creation(svm);
@ -4572,6 +4603,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
control->exit_info_1, control->exit_info_2);
ret = -EINVAL;
break;
case SVM_EXIT_IOIO:
if (!((control->exit_info_1 & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT))
return 1;
fallthrough;
default:
ret = svm_invoke_exit_handler(vcpu, control->exit_code);
}
@ -4592,6 +4628,9 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
if (unlikely(check_mul_overflow(count, size, &bytes)))
return -EINVAL;
if (!bytes)
return 1;
r = setup_vmgexit_scratch(svm, in, bytes);
if (r)
return r;

View File

@ -5227,8 +5227,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
* On a host with synchronized TSC, there is no need to update
* kvmclock on vcpu->cpu migration
*/
if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) {
if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
else
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
}
if (vcpu->cpu != cpu)
kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
vcpu->cpu = cpu;
@ -13366,6 +13371,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
raw_spin_lock_init(&kvm->arch.tsc_write_lock);
mutex_init(&kvm->arch.apic_map_lock);
seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE);
kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@ -14323,7 +14330,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
* the RAP (Return Address Predicator).
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
kvm_register_is_dirty(vcpu, VCPU_EXREG_ERAPS);
kvm_register_mark_dirty(vcpu, VCPU_EXREG_ERAPS);
kvm_invalidate_pcid(vcpu, operand.pcid);
return kvm_skip_emulated_instruction(vcpu);

View File

@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
kvm_callback = rcu_dereference(kvm_emergency_callback);
/*
* RCU may not be watching the crashing CPU here, so rcu_dereference()
* triggers a suspicious-RCU-usage splat. In principle, a concurrent
* KVM module unload could race with this read; see commit 2baa33a8ddd6
* ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
* which notes that nothing prevents module unload during panic/reboot.
*
* However, taking a lock here would be riskier than the current race:
* the system is going down via NMI shootdown, and any lock could be
* held by an already-stopped CPU. Use rcu_dereference_raw() to silence
* the lockdep splat and accept the comically small remaining race;
* panic context inherently cannot guarantee complete correctness.
*/
kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
if (kvm_callback)
kvm_callback();
}

View File

@ -41,10 +41,10 @@
#include <inttypes.h>
#include <limits.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "kvm_syscalls.h"
#include "kvm_util.h"
#include "test_util.h"
#include "memstress.h"

View File

@ -14,10 +14,10 @@
#include <linux/bitmap.h>
#include <linux/falloc.h>
#include <linux/sizes.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "kvm_syscalls.h"
#include "kvm_util.h"
#include "numaif.h"
#include "test_util.h"

View File

@ -2,8 +2,18 @@
#ifndef SELFTEST_KVM_SYSCALLS_H
#define SELFTEST_KVM_SYSCALLS_H
/*
* Include both the kernel and libc versions of mman.h. The kernel provides
* the most up-to-date flags and definitions, while libc provides the syscall
* wrappers tests expect.
*/
#include <linux/mman.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <test_util.h>
#define MAP_ARGS0(m,...)
#define MAP_ARGS1(m,t,a,...) m(t,a)
#define MAP_ARGS2(m,t,a,...) m(t,a), MAP_ARGS1(m,__VA_ARGS__)

View File

@ -19,9 +19,9 @@
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include "kselftest.h"
#include <linux/mman.h>
#include <linux/types.h>
#define msecs_to_usecs(msec) ((msec) * 1000ULL)

View File

@ -6,11 +6,14 @@
*/
#include "test_util.h"
#include <execinfo.h>
#include <sys/syscall.h>
#include "kselftest.h"
#ifdef __GLIBC__
#include <execinfo.h>
/* Dumps the current stack trace to stderr. */
static void __attribute__((noinline)) test_dump_stack(void);
static void test_dump_stack(void)
@ -57,6 +60,9 @@ static void test_dump_stack(void)
system(cmd);
#pragma GCC diagnostic pop
}
#else
static void test_dump_stack(void) {}
#endif
static pid_t _gettid(void)
{

View File

@ -5,13 +5,13 @@
* Copyright (C) 2018, Google LLC.
*/
#include "test_util.h"
#include "kvm_syscalls.h"
#include "kvm_util.h"
#include "processor.h"
#include "ucall_common.h"
#include <assert.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/stat.h>

View File

@ -15,7 +15,6 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <time.h>
#include <unistd.h>
@ -23,6 +22,7 @@
#include <linux/sizes.h>
#include <test_util.h>
#include <kvm_syscalls.h>
#include <kvm_util.h>
#include <processor.h>
#include <ucall_common.h>

View File

@ -4,11 +4,10 @@
*
* Copyright (C) 2024, Red Hat, Inc.
*/
#include <sys/mman.h>
#include <linux/fs.h>
#include "test_util.h"
#include "kvm_syscalls.h"
#include "kvm_util.h"
#include "kselftest.h"
#include "ucall_common.h"

View File

@ -4,8 +4,8 @@
*
* Copyright IBM Corp. 2021
*/
#include <sys/mman.h>
#include "test_util.h"
#include "kvm_syscalls.h"
#include "kvm_util.h"
#include "kselftest.h"
#include "ucall_common.h"

View File

@ -8,11 +8,11 @@
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <linux/compiler.h>
#include <test_util.h>
#include <kvm_syscalls.h>
#include <kvm_util.h>
#include <processor.h>