Commit Graph

51352 Commits

Author SHA1 Message Date
Linus Torvalds
d71358127c Fix incorrect hardware errors reported on Zen3 CPUs, such
as bogus L3 cache deferred errors. (Yazen Ghannam)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnbR60RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hksA/9E29kFCKEc+MpZ634k1Gb+fsUG0mPvUKa
 DMDqopOAlpW2TgcVkjEecPMwXIVVn2jC68pMGpW9BRkI5VxRbVq2UkmzvYsPGb4S
 lGO83ng4cxu+WHmWST4+7Mm/8XUZNxA3iunFJb/B1rxfA6R6s94VulcQ5PckE5ws
 8X8QdueFOYr/ZgscHpan4lXzWXzJnqvhypUr+UYeRQtweScgI9/HA19+zzfO4+Er
 TzdcperNeBTS0TaXjZTdDtCJ9rQD4iQvokfUqItEkfBLbv136BM9dWkJp3mbTlv9
 RUqbdPlXuFxxEUzJTpg2Y6qmo9n11nrt75UvE9WGuNPRQASqRvzIOfSfW7cMccJ0
 PyH34KovrjUATUNHCuhvk8KnIRjy+sToT/wGkKD8hXeszG7dsH2W3ED85wqGvpbs
 m6tb0JLYsLJ3eg40I+wwMmm3eMGJZNsxmstk7j7pBEqMPCh3X9vojlALfM+oiHwH
 Fw4/jwukKQsqWR8cupOkM1EaIlEC3vsjdVXrshVok+sZ6s1bJ5iBn0zO82YC5pHJ
 U4u5mY5g1c/mcZ1Y7Mt9+ZYeWvq8yD3jjgmvyJEsJN2FgsiOnmWLBPQNCR2faflm
 QlDEiW6GwBhrt5esaTT50+5Jm8e8gb1dsu9GpOgMb4Mfc+jWIX1h65NVOBItHkuA
 fjxy9HLW+p8=
 =vLvD
 -----END PGP SIGNATURE-----

Merge tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MCE fix from Ingo Molnar:
 "Fix incorrect hardware errors reported on Zen3 CPUs, such as bogus
  L3 cache deferred errors (Yazen Ghannam)"

* tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/amd: Filter bogus hardware errors on Zen3 clients
2026-04-12 08:27:09 -07:00
Linus Torvalds
c919577eee Four Intel uncore PMU driver fixes by Zide Chen.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnbRnARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iZ6Q//ZJVnNVBHFctOYSlnu+KBcdeAKtVZmvhi
 8BHetkzkO2ZPZzEFy3JRWfYcwy3toCS1JXccGUET6eI8gcpKYMUwGpzLzvoWy6p7
 CkWt4ABKDBFtY+pg48nffEiEAfIdMdLTyGs20boQZiAw3wx5K3vEafTr7eS3vzgX
 Tkqhu33nNxZkOabKVY5Vz5rnj6ZLC9zcsYnf4Bc79PR8Fe8f4a+LdcjWGNKw2Svh
 o694XNXDGz7iJgvO/5alVDRAC3p8tTZHt67d75sO3RbPCD619XN7kd2G3QkaTwn2
 OJZ8axx/gaLCnVKdk1FAmi87zGeM2Lf/dutpMnLhJcnqR5IdaVd5nqYHfChCoUvH
 BmWJnpTPXRJbrg+RSsDjVk43YexoVps7+7lhjo21vIyAwxLPffQoPIWTwCGSl0gO
 QcWSTHc9NBEQsrMRZJIOBaLGVf9lu7DiHs4gJIllmHp4VULSYmqS/vxNgbWf1/Z0
 05Zj5Mypz6HHBQv9+HYB0nCNgKQG3N7VrU71nDIFP6Ah0in77UZmLKr75jMQ0ZSn
 zAjyKzWU+OBhooTpoVDLr7ov+zDVNf4R/ItL5Q4GfCcBTmts0vXoSStKEh7w5XDw
 3JwhJLkwQjLFmZrwXEfcrZ8unJh/mkRhR2Ycuv5lqkla3AKBrSPCkNpRfw2kQ/Ro
 H4T2vVHuOu0=
 =x3sc
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Four Intel uncore PMU driver fixes by Zide Chen"

* tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Remove extra double quote mark
  perf/x86/intel/uncore: Fix die ID init and look up bugs
  perf/x86/intel/uncore: Skip discovery table for offline dies
  perf/x86/intel/uncore: Fix iounmap() leak on global_init failure
2026-04-12 08:17:52 -07:00
Linus Torvalds
086aca1030 s390:
* KVM: s390: vsie: Fix races with partial gmap invalidations
 
 x86:
 * KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnaOrgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPTOwf8DW6BXYgHdDOWiiQQETD+D/JWbceG
 9fKMdNjt48It3hKWc9oJ2eZU2avRHf7d8hAIUIhOeiUbeVf4QrLUfQXzP9j/9P+T
 vRpMlDf5Ampv3m8LxTBGESgwrlRHtWDGUFsE+CcVAIWEQfCsXnbwkeo3L9aCLTgA
 ekrnHqsx+Oh/n2+siEp0Nz0n8gT0hCtbqAqJlVcuHpJvzRzeDvcnukvHxjIydR65
 uIFY5dahzheGqbPhplGKKAdPCHD+/S6QB+ShqKrT92zeZvhPZrt1XHV4Bt/sKSkP
 9uAeuJ+JtbZvMG7n8fCg5ebwqJrw15uddZcV8l3qIuxHzyZ/XzKYhQm4oA==
 =7zjP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "s390:
   - vsie: Fix races with partial gmap invalidations

  x86:
   - Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: vsie: Fix races with partial gmap invalidations
  KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
2026-04-11 11:45:20 -07:00
Paolo Bonzini
0e9b0e0124 KVM x86 fixes for 7.1
Declare flexible arrays in uAPI structures using __DECLARE_FLEX_ARRAY() so
 that KVM's uAPI headers can be included in C++ projects.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmnZHqsACgkQOlYIJqCj
 N/331BAApsvBOvcKxeHM598wAsZTtBIUiMWrm/hybB2zhXxbcC9BDPN5NrYP9eJX
 khlLm9YRDI2Hvk8QNuwPXV/mHU5U0HNJ48BUToL6H5x6792dnRCbL046rYCyRZbi
 bUjMcUjTWtv7g+UEoYOsMpmYQlTlf4krCbw2ixn6/4c2Ab76TRmrISU+tknoal+b
 KGXEJWhsEiueUD8xpjR84P0h3d6x+EHc2oyDk/k+aFZAcbjWFz/8aLjOSVc00V36
 DqYKbMGO/22CNkWSLk9Dr6mitn6HmG151HNAvUvHlPMFQLrP9jpk13u1IHZsR7H4
 4yykj4tm5+02775IFqfPLNZ4Ipk70WO50ndl3plh7G7187ckYsfl+oQu2c4oIveB
 sPfnEHqteGKw+GLOM9Xu4MVCAvy7FFGlgGIkZpULA7cLNyIayfmKuZTF7/UEh9Wy
 fL2UAypzJjCIgLFyoio4CHqLJ4vuUneyHyoJczS/Wd9kuL0kHrLEw771gLGjwslB
 Nk0900qPlxEWx47G6MadzSh+JexjT9KrSCgaACNWKJpzQMkcw6gwDSIGq9F/f0Ac
 Zl7XFbQ2KKm/Z8CWCTg5XdxI6zz0NCzlcYoepk7CfHgs0xZjWqu5AReLcQnffx2/
 c5YKoOffkicokffWQju64kjE/VYpdAITHONa1r7hv73/bfYfUGg=
 =VJsI
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-7.1' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 7.1

Declare flexible arrays in uAPI structures using __DECLARE_FLEX_ARRAY() so
that KVM's uAPI headers can be included in C++ projects.
2026-04-11 14:10:44 +02:00
Linus Torvalds
52f657e34d x86: shadow stacks: proper error handling for mmap lock
김영민 reports that shstk_pop_sigframe() doesn't check for errors from
mmap_read_lock_killable(), which is a silly oversight, and also shows
that we haven't marked those functions with "__must_check", which would
have immediately caught it.

So let's fix both issues.

Reported-by: 김영민 <osori@hspace.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-04-08 13:18:57 -07:00
Zide Chen
16bcbe6738 perf/x86/intel/uncore: Remove extra double quote mark
The third argument in INTEL_UNCORE_FR_EVENT_DESC() is subject to
__stringify(), and the extra double quote marks can result in the
expansion "3.814697266e-6" in the sysfs knobs, instead of
3.814697266e-6.

This is incorrect, though it may still work for perf, e.g.
perf stat -e uncore_iio_free_running_0/bw_in_port0/

Fixes: d8987048f6 ("perf/x86/intel/uncore: Support IIO free-running counters on DMR")
Closes: https://lore.kernel.org/all/20251231224233.113839-1-zide.chen@intel.com/
Reported-by: Chun-Tse Shao <ctshao@google.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chun-Tse Shao <ctshao@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260313174050.171704-5-zide.chen@intel.com
2026-04-07 09:05:30 +02:00
Zide Chen
a16d1ec4dd perf/x86/intel/uncore: Fix die ID init and look up bugs
In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path,
uncore_device_to_die() may return -1 when all CPUs associated
with the UBOX device are offline.

Remove the WARN_ON_ONCE(die_id == -1) check for two reasons:

- The current code breaks out of the loop. This is incorrect because
  pci_get_device() does not guarantee iteration in domain or bus order,
  so additional UBOX devices may be skipped during the scan.

- Returning -EINVAL is incorrect, since marking offline buses with
  die_id == -1 is expected and should not be treated as an error.

Separately, when NUMA is disabled on a NUMA-capable platform,
pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die()
to return -1 for all PCI devices.  As a result,
spr_update_device_location(), used on Intel SPR and EMR, ignores the
corresponding PMON units and does not add them to the RB tree.

Fix this by using uncore_pcibus_to_dieid(), which retrieves topology
from the UBOX GIDNIDMAP register and works regardless of whether NUMA
is enabled in Linux.  This requires snbep_pci2phy_map_init() to be
added in spr_uncore_pci_init().

Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where
NUMA is expected to be enabled.

Fixes: 9a7832ce3d ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info")
Fixes: 65248a9a9e ("perf/x86/uncore: Add a quirk for UPI on SPR")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://patch.msgid.link/20260313174050.171704-4-zide.chen@intel.com
2026-04-07 09:05:29 +02:00
Zide Chen
7b568e9eba perf/x86/intel/uncore: Skip discovery table for offline dies
This warning can be triggered if NUMA is disabled and the system
boots with fewer CPUs than the number of CPUs in die 0.

WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore]

Currently, the discovery table continues to be parsed even if all CPUs
in the associated die are offline.  This can lead to an array overflow
at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may
trigger the warning above or cause other issues.

Fixes: edae1f06c2 ("perf/x86/intel/uncore: Parse uncore discovery tables")
Reported-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://patch.msgid.link/20260313174050.171704-3-zide.chen@intel.com
2026-04-07 09:05:29 +02:00
Zide Chen
e2a39d1a88 perf/x86/intel/uncore: Fix iounmap() leak on global_init failure
Kernel test robot reported:

Unverified Error/Warning (likely false positive, kindly check if
interested):
    arch/x86/events/intel/uncore_discovery.c:293:2-8:
    ERROR: missing iounmap; ioremap on line 288 and execution via
    conditional on line 292

If domain->global_init() fails in __parse_discovery_table(), the
ioremap'ed MMIO region is not released before returning, resulting
in an MMIO mapping leak.

Fixes: b575fc0e33 ("perf/x86/intel/uncore: Add domain global init callback")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260313174050.171704-2-zide.chen@intel.com
2026-04-07 09:05:29 +02:00
Linus Torvalds
10b76a429a Miscellaneous x86 fixes:
- Fix kexec crash on KCOV-instrumented kernels (Aleksandr Nogikh)
 
  - Fix Geode platform driver on-stack property data
    use-after-return bug (Dmitry Torokhov)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnSMJIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i0WA/8CG2pPsVNJdYeAQmRNfwPZ5NVvb+wUTcf
 5mLXHrP63eRcLwHq9uIin8Zq6LwqBGQPeZAIIqfOvgEC9KgBdyzH7FzrnoPF+gmo
 pgRhjypqP6iGt2+4Y6Qa9fW1NAb4zIuQqzP55MhIzEpncfzqnh5qrv5E3qTQvEs7
 KbUp0F4ovxhhaxoXO88crf3uPEgi7nywF5Z88sCHNkk/F9FxT0dzo3LjkldnOWAU
 f1Cwl1VpjmkOcBKA1tJ6u/48b6cnMxS3etMG6KXxYR1D04+XxN0BH7gjc7D0BclL
 CP7CXvB4kh4cyBYJxQRng5/18gFbY1vq5yFpFjNjmtBk3bABEw73f88TWmxerPrV
 4RZKoUkqcBikWVfuaWupJVcDeCV5fm0FVquiUZ/SfAObCCi5be2VjZSUOcTpZTp6
 k5ikgyZpsg6THVjgxGH51ac+RfF+o33NN1iYTfTcvUnNQwuiv1ObWvyIi1NEkKHw
 LRyRMAG4dkRdUwaQvtNSeCptNS0JlmE3ozwGYV7y6HTNwN1aVdW2uO22YEutiYal
 95Id4AEZDZUMlDyekSgqgCufRwma0kYbJotvgc15OZfJFzl9il9llO3iESCx6Rrk
 Ve9qhn+Jyb+JO2iPKsEm2zNtzFtvas8EE4T/6AfmZEKGGL05nO0Sk8KHG3AsTTb6
 Dx29Q7orrog=
 =PIxS
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix kexec crash on KCOV-instrumented kernels (Aleksandr Nogikh)

 - Fix Geode platform driver on-stack property data use-after-return
   bug (Dmitry Torokhov)

* tag 'x86-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/geode: Fix on-stack property data use-after-return bug
  x86/kexec: Disable KCOV instrumentation after load_segments()
2026-04-05 13:53:07 -07:00
Yazen Ghannam
0422b07bc4 x86/mce/amd: Filter bogus hardware errors on Zen3 clients
Users have been observing multiple L3 cache deferred errors after recent
kernel rework of deferred error handling.¹ ⁴

The errors are bogus due to inconsistent status values. Also, user verified
that bogus MCA_DESTAT values are present on the system even with an older
kernel.²

The errors seem to be garbage values present in the MCA_DESTAT of some L3
cache banks. These were implicitly ignored before the recent kernel rework
because these do not generate a deferred error interrupt.

A later revision of the rework patch was merged for v6.19. This naturally
filtered out most of the bogus error logs. However, a few signatures still
remain.³

Minimize the scope of the filter to the reported CPU
family/model/stepping and only for errors which don't have the Enabled
bit in the MCi status MSR.

¹ https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
² https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de
³ https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.dehttps://lore.kernel.org/r/CAKFB093B2k3sKsGJ_QNX1jVQsaXVFyy=wNwpzCGLOXa_vSDwXw@mail.gmail.com

  [ bp: Generalize the condition according to which errors are bogus. ]

Fixes: 7cb735d7c0 ("x86/mce: Unify AMD DFR handler with MCA Polling")
Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
Reported-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-By: Bert Karwatzki <spasswolf@web.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
2026-04-05 12:42:22 +02:00
Ian Rogers
dbde07f062 perf/x86: Fix potential bad container_of in intel_pmu_hw_config
Auto counter reload may have a group of events with software events
present within it. The software event PMU isn't the x86_hybrid_pmu and
a container_of operation in intel_pmu_set_acr_caused_constr (via the
hybrid helper) could cause out of bound memory reads. Avoid this by
guarding the call to intel_pmu_set_acr_caused_constr with an
is_x86_event check.

Fixes: ec980e4fac ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://patch.msgid.link/20260312194305.1834035-1-irogers@google.com
2026-04-02 13:49:16 +02:00
Dmitry Torokhov
b981e9e94c x86/platform/geode: Fix on-stack property data use-after-return bug
The PROPERTY_ENTRY_GPIO macro (and by extension PROPERTY_ENTRY_REF)
creates a temporary software_node_ref_args structure on the stack
when used in a runtime assignment. This results in the property
pointing to data that is invalid once the function returns.

Fix this by ensuring the GPIO reference data is not stored on stack and
using PROPERTY_ENTRY_REF_ARRAY_LEN() to point directly to the persistent
reference data.

Fixes: 298c9babad ("x86/platform/geode: switch GPIO buttons and LEDs to software properties")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Scally <djrscally@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Hans de Goede <hansg@kernel.org>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260329-property-gpio-fix-v2-1-3cca5ba136d8@gmail.com
2026-03-31 09:55:26 +02:00
Aleksandr Nogikh
917e3ad332 x86/kexec: Disable KCOV instrumentation after load_segments()
The load_segments() function changes segment registers, invalidating GS base
(which KCOV relies on for per-cpu data). When CONFIG_KCOV is enabled, any
subsequent instrumented C code call (e.g. native_gdt_invalidate()) begins
crashing the kernel in an endless loop.

To reproduce the problem, it's sufficient to do kexec on a KCOV-instrumented
kernel:

  $ kexec -l /boot/otherKernel
  $ kexec -e

The real-world context for this problem is enabling crash dump collection in
syzkaller. For this, the tool loads a panic kernel before fuzzing and then
calls makedumpfile after the panic. This workflow requires both CONFIG_KEXEC
and CONFIG_KCOV to be enabled simultaneously.

Adding safeguards directly to the KCOV fast-path (__sanitizer_cov_trace_pc())
is also undesirable as it would introduce an extra performance overhead.

Disabling instrumentation for the individual functions would be too fragile,
so disable KCOV instrumentation for the entire machine_kexec_64.c and
physaddr.c. If coverage-guided fuzzing ever needs these components in the
future, other approaches should be considered.

The problem is not relevant for 32 bit kernels as CONFIG_KCOV is not supported
there.

  [ bp: Space out comment for better readability. ]

Fixes: 0d345996e4 ("x86/kernel: increase kcov coverage under arch/x86/kernel folder")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260325154825.551191-1-nogikh@google.com
2026-03-30 14:15:25 +02:00
Linus Torvalds
ac354b5cb0 s390:
* Lots of small and not-so-small fixes for the newly rewritten gmap,
   mostly affecting the handling of nested guests.
 
 x86:
 
 * Fix an issue with shadow paging, which causes KVM to install an MMIO PTE
   in the shadow page tables without first zapping a non-MMIO SPTE if KVM
   didn't see the write that modified the shadowed guest PTE.  While commit
   a54aa15c6b was right about it being impossible to miss such a write
   if it was coming from the guest, it failed to account for writes to
   guest memory that are outside the scope of KVM: if userspace modifies
   the guest PTE, and then the guest hits a relevant page fault, KVM will
   get confused.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnH3j8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOWRQf7BD1dgyO9Id+Y/QQJPzZ0z/zGbNWT
 jLDTpapxSB960AybvmkOl0pgr7AJrNN+iWQ5cbod/41NKEdJn++ME++NFQlt15oH
 gZAMdVr72qklyVFOq3BZhQRskleGo35A/YYznKf+re4tdvL5fynyYTLDwVkDR4NU
 tCwHCg+B6bVSNOLjxMm5eOpDXoboGiwohFYay7IclsXibjDlKyFaj9mZPJW1E6qy
 SUp+nuseUTf8RFFscNTsW6XRPa/Y7RctPBNQuGSiw3rxFXsq+VyD6Y/AOklbdeyz
 8u+25gdKm65sdXFmLWIN1Ogec0DcKMgdNpFrgEj+9PPWyHDHikqksv/vRw==
 =/YA7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "s390:

   - Lots of small and not-so-small fixes for the newly rewritten gmap,
     mostly affecting the handling of nested guests.

  x86:

   - Fix an issue with shadow paging, which causes KVM to install an
     MMIO PTE in the shadow page tables without first zapping a non-MMIO
     SPTE if KVM didn't see the write that modified the shadowed guest
     PTE.

     While commit a54aa15c6b ("KVM: x86/mmu: Handle MMIO SPTEs
     directly in mmu_set_spte()") was right about it being impossible to
     miss such a write if it was coming from the guest, it failed to
     account for writes to guest memory that are outside the scope of
     KVM: if userspace modifies the guest PTE, and then the guest hits a
     relevant page fault, KVM will get confused"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE
  KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE
  KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl
  KVM: s390: vsie: Fix guest page tables protection
  KVM: s390: vsie: Fix unshadowing while shadowing
  KVM: s390: vsie: Fix refcount overflow for shadow gmaps
  KVM: s390: vsie: Fix nested guest memory shadowing
  KVM: s390: Correctly handle guest mappings without struct page
  KVM: s390: Fix gmap_link()
  KVM: s390: vsie: Fix check for pre-existing shadow mapping
  KVM: s390: Remove non-atomic dat_crstep_xchg()
  KVM: s390: vsie: Fix dat_split_ste()
2026-03-29 11:58:47 -07:00
Linus Torvalds
f242ac4a09 Miscellaneous x86 fixes:
- Fix an early boot crash in AMD SEV-SNP guests, caused by incorrect
    FSGSBASE init ordering. (Nikunj A Dadhania)
 
  - Remove X86_CR4_FRED from the CR4 pinned bits mask, to fix a race
    window during the bootup of SEV-{ES,SNP} or TDX guests, which
    can crash them if they trigger exceptions in that window.
    (Borislav Petkov)
 
  - Fix early boot failures on SEV-ES/SNP guests, due to incorrect
    early GHCB access. (Nikunj A Dadhania)
 
  - Add clarifying comment to the CRn pinning logic, to avoid
    future confusion & bugs. (Peter Zijlstra)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnIudcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jDaA/9GAF9AwwF8Gle0/AbQFjgIG2lA4G1D00q
 bfToKQeCAqBmlro7SP/4h9Kr70/xtnPCRP/BwTWEHHY/2Z5fLJOXcra5FWWPjjA1
 HZE6HHKFDXmPV2BfbdE5ezX8pSwBdGZ/vIykZ+q/q3/P7P/1lmxfFA+rEvbkykFH
 CQwAA3fVQuwQUiBY5eRN+0lwoIKRX3gnREtDflkgoqhhntoJ/hpnoytQAQX5Sb5j
 +MNpy4iIDFYLqRz3GHepcMDUB9W9e+9YvWd1qxsdgxyys5zVW9l7Z1qm3SHXd2uw
 OWAFu/ySKjlage4f8hPUbZHwtllzxM6HfyHcVoxjLnP/sa/9g9kJPhCGz1R1CoyM
 s2/2PM43dKN72IsWThUppa7JUENpaVUqDwOm8mBD07BZVZ7THKS1d4nLqxSyGIaN
 yDcEoH+6sbv0PhqC6GOg3Oi7DW66Q7LPJcf1AsrzW9f7pUBZc+K9KnqvY+U2SewH
 5OS3tzXhTLnRRkvJxLB3EFFSpefOFyEQ6WruZfrbijNVupIZpKpnqrN8xfDvF4p0
 XJ1rw03F1ltV9OO8YDfJfEjav1eJWTZcqGA8ZMnlZTzrB1+oJGw7i4n1rAILJNEF
 77wARixDNiiOsFcvF7dTG7l7HWEx1mWBkn75n+lnMvWqWd1QN4B2z85CIm5PdsnW
 0D0wd6vxUtk=
 =I3Gh
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix an early boot crash in AMD SEV-SNP guests, caused by incorrect
   FSGSBASE init ordering (Nikunj A Dadhania)

 - Remove X86_CR4_FRED from the CR4 pinned bits mask, to fix a race
   window during the bootup of SEV-{ES,SNP} or TDX guests, which can
   crash them if they trigger exceptions in that window (Borislav
   Petkov)

 - Fix early boot failures on SEV-ES/SNP guests, due to incorrect early
   GHCB access (Nikunj A Dadhania)

 - Add clarifying comment to the CRn pinning logic, to avoid future
   confusion & bugs (Peter Zijlstra)

* tag 'x86-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add comment clarifying CRn pinning
  x86/fred: Fix early boot failures on SEV-ES/SNP guests
  x86/cpu: Remove X86_CR4_FRED from the CR4 pinned bits mask
  x86/cpu: Enable FSGSBASE early in cpu_init_exception_handling()
2026-03-29 10:04:37 -07:00
Linus Torvalds
56bea42415 EFI fix for v7.0 #3
Fix a potential buffer overrun issue introduced by the preceding fix for
 EFI boot services region reservations on x86
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCacFhTgAKCRAwbglWLn0t
 XDRBAP9WdbDUIM/ucLDFlr9hpokd+JSOS/vNgRFZaBhkjjasOAEAvdxOeShZlYWH
 CVaW0D1TZuyvuUlGP6Tqqa8lBeoPVAc=
 =y//h
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Fix a potential buffer overrun issue introduced by the previous fix
  for EFI boot services region reservations on x86"

* tag 'efi-fixes-for-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free size
2026-03-27 15:55:25 -07:00
Sean Christopherson
df83746075 KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE
Adjust KVM's sanity check against overwriting a shadow-present SPTE with a
another SPTE with a different target PFN to only apply to direct MMUs,
i.e. only to MMUs without shadowed gPTEs.  While it's impossible for KVM
to overwrite a shadow-present SPTE in response to a guest write, writes
from outside the scope of KVM, e.g. from host userspace, aren't detected
by KVM's write tracking and so can break KVM's shadow paging rules.

  ------------[ cut here ]------------
  pfn != spte_to_pfn(*sptep)
  WARNING: arch/x86/kvm/mmu/mmu.c:3069 at mmu_set_spte+0x1e4/0x440 [kvm], CPU#0: vmx_ept_stale_r/872
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 0 UID: 1000 PID: 872 Comm: vmx_ept_stale_r Not tainted 7.0.0-rc2-eafebd2d2ab0-sink-vm #319 PREEMPT
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:mmu_set_spte+0x1e4/0x440 [kvm]
  Call Trace:
   <TASK>
   ept_page_fault+0x535/0x7f0 [kvm]
   kvm_mmu_do_page_fault+0xee/0x1f0 [kvm]
   kvm_mmu_page_fault+0x8d/0x620 [kvm]
   vmx_handle_exit+0x18c/0x5a0 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xc55/0x1c20 [kvm]
   kvm_vcpu_ioctl+0x2d5/0x980 [kvm]
   __x64_sys_ioctl+0x8a/0xd0
   do_syscall_64+0xb5/0x730
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  ---[ end trace 0000000000000000 ]---

Fixes: 11d4517511 ("KVM: x86/mmu: Warn if PFN changes on shadow-present SPTE in shadow MMU")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-27 22:33:33 +01:00
Sean Christopherson
aad885e774 KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE
When installing an emulated MMIO SPTE, do so *after* dropping/zapping the
existing SPTE (if it's shadow-present).  While commit a54aa15c6b was
right about it being impossible to convert a shadow-present SPTE to an
MMIO SPTE due to a _guest_ write, it failed to account for writes to guest
memory that are outside the scope of KVM.

E.g. if host userspace modifies a shadowed gPTE to switch from a memslot
to emulted MMIO and then the guest hits a relevant page fault, KVM will
install the MMIO SPTE without first zapping the shadow-present SPTE.

  ------------[ cut here ]------------
  is_shadow_present_pte(*sptep)
  WARNING: arch/x86/kvm/mmu/mmu.c:484 at mark_mmio_spte+0xb2/0xc0 [kvm], CPU#0: vmx_ept_stale_r/4292
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 0 UID: 1000 PID: 4292 Comm: vmx_ept_stale_r Not tainted 7.0.0-rc2-eafebd2d2ab0-sink-vm #319 PREEMPT
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:mark_mmio_spte+0xb2/0xc0 [kvm]
  Call Trace:
   <TASK>
   mmu_set_spte+0x237/0x440 [kvm]
   ept_page_fault+0x535/0x7f0 [kvm]
   kvm_mmu_do_page_fault+0xee/0x1f0 [kvm]
   kvm_mmu_page_fault+0x8d/0x620 [kvm]
   vmx_handle_exit+0x18c/0x5a0 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xc55/0x1c20 [kvm]
   kvm_vcpu_ioctl+0x2d5/0x980 [kvm]
   __x64_sys_ioctl+0x8a/0xd0
   do_syscall_64+0xb5/0x730
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x47fa3f
   </TASK>
  ---[ end trace 0000000000000000 ]---

Reported-by: Alexander Bulekov <bkov@amazon.com>
Debugged-by: Alexander Bulekov <bkov@amazon.com>
Suggested-by: Fred Griffoul <fgriffo@amazon.co.uk>
Fixes: a54aa15c6b ("KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-27 22:33:33 +01:00
Peter Zijlstra
a3e93cac25 x86/cpu: Add comment clarifying CRn pinning
To avoid future confusion on the purpose and design of the CRn pinning code.

Also note that if the attacker controls page-tables, the CRn bits lose much of
the attraction anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260320092521.GG3739106@noisy.programming.kicks-ass.net
2026-03-23 14:25:53 +01:00
Nikunj A Dadhania
3645eb7e39 x86/fred: Fix early boot failures on SEV-ES/SNP guests
FRED-enabled SEV-(ES,SNP) guests fail to boot due to the following issues
in the early boot sequence:

* FRED does not have a #VC exception handler in the dispatch logic

* Early FRED #VC exceptions attempt to use uninitialized per-CPU GHCBs
  instead of boot_ghcb

Add X86_TRAP_VC case to fred_hwexc() with a new exc_vmm_communication()
function that provides the unified entry point FRED requires, dispatching
to existing user/kernel handlers based on privilege level. The function is
already declared via DECLARE_IDTENTRY_VC().

Fix early GHCB access by falling back to boot_ghcb in
__sev_{get,put}_ghcb() when per-CPU GHCBs are not yet initialized.

Fixes: 14619d912b ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>  # 6.12+
Link: https://patch.msgid.link/20260318075654.1792916-4-nikunj@amd.com
2026-03-23 14:18:18 +01:00
Borislav Petkov (AMD)
411df123c0 x86/cpu: Remove X86_CR4_FRED from the CR4 pinned bits mask
Commit in Fixes added the FRED CR4 bit to the CR4 pinned bits mask so
that whenever something else modifies CR4, that bit remains set. Which
in itself is a perfectly fine idea.

However, there's an issue when during boot FRED is initialized: first on
the BSP and later on the APs. Thus, there's a window in time when
exceptions cannot be handled.

This becomes particularly nasty when running as SEV-{ES,SNP} or TDX
guests which, when they manage to trigger exceptions during that short
window described above, triple fault due to FRED MSRs not being set up
yet.

See Link tag below for a much more detailed explanation of the
situation.

So, as a result, the commit in that Link URL tried to address this
shortcoming by temporarily disabling CR4 pinning when an AP is not
online yet.

However, that is a problem in itself because in this case, an attack on
the kernel needs to only modify the online bit - a single bit in RW
memory - and then disable CR4 pinning and then disable SM*P, leading to
more and worse things to happen to the system.

So, instead, remove the FRED bit from the CR4 pinning mask, thus
obviating the need to temporarily disable CR4 pinning.

If someone manages to disable FRED when poking at CR4, then
idt_invalidate() would make sure the system would crash'n'burn on the
first exception triggered, which is a much better outcome security-wise.

Fixes: ff45746fbf ("x86/cpu: Add X86_CR4_FRED macro")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org> # 6.12+
Link: https://lore.kernel.org/r/177385987098.1647592.3381141860481415647.tip-bot2@tip-bot2
2026-03-23 14:12:03 +01:00
Nikunj A Dadhania
05243d490b x86/cpu: Enable FSGSBASE early in cpu_init_exception_handling()
Move FSGSBASE enablement from identify_cpu() to cpu_init_exception_handling()
to ensure it is enabled before any exceptions can occur on both boot and
secondary CPUs.

== Background ==

Exception entry code (paranoid_entry()) uses ALTERNATIVE patching based on
X86_FEATURE_FSGSBASE to decide whether to use RDGSBASE/WRGSBASE instructions
or the slower RDMSR/SWAPGS sequence for saving/restoring GSBASE.

On boot CPU, ALTERNATIVE patching happens after enabling FSGSBASE in CR4.
When the feature is available, the code is permanently patched to use
RDGSBASE/WRGSBASE, which require CR4.FSGSBASE=1 to execute without triggering

== Boot Sequence ==

Boot CPU (with CR pinning enabled):
  trap_init()
    cpu_init()                   <- Uses unpatched code (RDMSR/SWAPGS)
      x2apic_setup()
  ...
  arch_cpu_finalize_init()
    identify_boot_cpu()
      identify_cpu()
        cr4_set_bits(X86_CR4_FSGSBASE)  # Enables the feature
	# This becomes part of cr4_pinned_bits
    ...
    alternative_instructions()   <- Patches code to use RDGSBASE/WRGSBASE

Secondary CPUs (with CR pinning enabled):
  start_secondary()
    cr4_init()                   <- Code already patched, CR4.FSGSBASE=1
                                    set implicitly via cr4_pinned_bits

    cpu_init()                   <- exceptions work because FSGSBASE is
                                    already enabled

Secondary CPU (with CR pinning disabled):
  start_secondary()
    cr4_init()                   <- Code already patched, CR4.FSGSBASE=0
    cpu_init()
      x2apic_setup()
        rdmsrq(MSR_IA32_APICBASE)  <- Triggers #VC in SNP guests
          exc_vmm_communication()
            paranoid_entry()       <- Uses RDGSBASE with CR4.FSGSBASE=0
                                      (patched code)
    ...
    ap_starting()
      identify_secondary_cpu()
        identify_cpu()
	  cr4_set_bits(X86_CR4_FSGSBASE)  <- Enables the feature, which is
                                             too late

== CR Pinning ==

Currently, for secondary CPUs, CR4.FSGSBASE is set implicitly through
CR-pinning: the boot CPU sets it during identify_cpu(), it becomes part of
cr4_pinned_bits, and cr4_init() applies those pinned bits to secondary CPUs.
This works but creates an undocumented dependency between cr4_init() and the
pinning mechanism.

== Problem ==

Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1. Upcoming changes to CR pinning
behavior will break the implicit dependency, causing secondary CPUs to
generate #UD.

This issue manifests itself on AMD SEV-SNP guests, where the rdmsrq() in
x2apic_setup() triggers a #VC exception early during cpu_init(). The #VC
handler (exc_vmm_communication()) executes the patched paranoid_entry() path.
Without CR4.FSGSBASE enabled, RDGSBASE instructions trigger #UD.

== Fix ==

Enable FSGSBASE explicitly in cpu_init_exception_handling() before loading
exception handlers. This makes the dependency explicit and ensures both
boot and secondary CPUs have FSGSBASE enabled before paranoid_entry()
executes.

Fixes: c82965f9e5 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20260318075654.1792916-2-nikunj@amd.com
2026-03-23 13:29:50 +01:00
Linus Torvalds
8d8bd2a5aa Miscellaneous x86 fixes:
- Improve Qemu MCE-injection behavior by only using
    AMD SMCA MSRs if the feature bit is set.
 
  - Fix the relative path of gettimeofday.c inclusion
    in vclock_gettime.c
 
  - Fix a boot crash on UV clusters when a socket is
    marked as 'deconfigured' which are mapped to the
    SOCK_EMPTY (0xffff [short -1]) node ID by the UV firmware,
    while Linux APIs expect NUMA_NO_NODE (0xffffffff [int -1]).
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmm/qYwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1imUBAAmL2xQCgOZCeLPWFZ28S5EiKf5xvyC9R4
 uIm/WncXYQDJZN0JaTABdjmLiaBQnlyULpmqN47Xuiy8avMO532S92yreFpWR2OB
 5TEE/v9+wcbSQOJBELhch3XNzUu7cNPQ+HbOuhot4rpK/MlyJ8rHaHLYVSVhRBYG
 ynM3iTDvD6ENtyAOGT1Afkxg/sqMOZG8jdwrN1z8BH3AMU8BJ6OgfguqKuEi99Vp
 Lz31a3bR0LcRPZ4ECKH0ModcKgjkqcgnVccOYCDtq8XReciGQo1Zg0diiV6VRn2a
 hAqoSvdd9e1XUGoU8dEi0KwR5YXYJh3uOmjSH56rbceXr5XxPcKRJ4mOOX0GX+KC
 KZl5KdwJo4veHodxWTYFe+67hi84BwKoB4gM36nRhDmzLS/XHoI4RUOjZVXSPVzJ
 dwBBZDKd9tP6aaFruSa+v5sH7EbVartwLuEkQx6SDydS/4htGbESKTxgdqr63gUZ
 jwmFRVaPo3FUWkOhS9GExt6XewhBVaJ9/cHZkcpW0vrm9RBlFoyTX9YLfkF4x9O1
 h3wxXsULyZuYn5/tunIFfDsLpcmgsz85NXnW+0A1URAAWFNRYfVIKxaO7tUUF2OA
 asnOe0NrztvKoZfKzCi7s14HVCPT34aEp3Vf2Am/QRMy5JYLhS3xAwApiZ/hrZ4R
 Kb+4dVfh12Q=
 =uIqH
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Improve Qemu MCE-injection behavior by only using AMD SMCA MSRs if
   the feature bit is set

 - Fix the relative path of gettimeofday.c inclusion in vclock_gettime.c

 - Fix a boot crash on UV clusters when a socket is marked as
   'deconfigured' which are mapped to the SOCK_EMPTY node ID by
   the UV firmware, while Linux APIs expect NUMA_NO_NODE.

   The difference being (0xffff [unsigned short ~0]) vs [int -1]

* tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Handle deconfigured sockets
  x86/entry/vdso: Fix path of included gettimeofday.c
  x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
2026-03-22 10:54:12 -07:00
Linus Torvalds
ebfd9b7af2 Miscellaneous perf fixes:
- Fix a PMU driver crash on AMD EPYC systems, caused by
    a race condition in x86_pmu_enable()
 
  - Fix a possible counter-initialization bug in x86_pmu_enable()
 
  - Fix a counter inheritance bug in inherit_event() and
    __perf_event_read()
 
  - Fix an Intel PMU driver branch constraints handling bug
    found by UBSAN
 
  - Fix the Intel PMU driver's new Off-Module Response (OMR)
    support code for Diamond Rapids / Nova lake, to fix a snoop
    information parsing bug
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmm/ptcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i8yBAAsbAbTzyJOxiE29beCiW3r9V4ELtrSlpG
 syUtZ4Y1cozK1w6/8S2oWHJkuWa+ToO6bn9rKxNemWIrnsxe5B4iKfeNWRFTz24g
 41xTXaxUB9c7vBgv4BDvr/ykkYGQybkn0Bf/U5rufzvIlst9bx7zKVAnIT9Qws37
 UTMY96XGYY5HNzGSZbQkpQ4cs8n72U+00OBHMTWtH8NJT+fRmaM312Q8F6wNKgH2
 YtaAjwb55BU5+hQUz5YN96xQGYaoj0s8UtOk4a3tS/t0F8mOodDVTxzMdHKToQmD
 SbscGvfC+bg5zjoYGFEU+cXoBMkkZlBPqZdLVQAGEy3YZT+JIdmyJhn9FD6HVuVw
 OzyG9VuY+TxOFrFQdMs3Xfa0vZ7AO1c90HB08P4T7nWaMioR1iFAF31MSVEMXuzd
 ZROHplWNIqDeOzmerXgZq4JWy3Bpaam8fH1B5/qN450oAxaym3lCOoCZidJYgy3g
 CVBF/6BO7DlpKiy9lXknRscItwIiRmZ9Xr+sOmOMQRGqQkC6Ykk/Hj2sA1qKPuQ5
 ruRqqaL9cznttAoJR2jZ0Ijyu7usgxB/y066nR1zXKdvEdNcntaic+QybHxbQoYx
 kyYNoR1dg+AWLb5juT68abkP4trZ8EUa7Q29OX1SzTk+0U7M0fO3/rq9gt71HiHR
 WSjRDQPfWrI=
 =jp5H
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:

 - Fix a PMU driver crash on AMD EPYC systems, caused by
   a race condition in x86_pmu_enable()

 - Fix a possible counter-initialization bug in x86_pmu_enable()

 - Fix a counter inheritance bug in inherit_event() and
   __perf_event_read()

 - Fix an Intel PMU driver branch constraints handling bug
   found by UBSAN

 - Fix the Intel PMU driver's new Off-Module Response (OMR)
   support code for Diamond Rapids / Nova lake, to fix a snoop
   information parsing bug

* tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix OMR snoop information parsing issues
  perf/x86/intel: Add missing branch counters constraint apply
  perf: Make sure to use pmu_ctx->pmu for groups
  x86/perf: Make sure to program the counter value for stopped events on migration
  perf/x86: Move event pointer setup earlier in x86_pmu_enable()
2026-03-22 10:31:51 -07:00
Kyle Meyer
1f6aa5bbf1 x86/platform/uv: Handle deconfigured sockets
When a socket is deconfigured, it's mapped to SOCK_EMPTY (0xffff). This causes
a panic while allocating UV hub info structures.

Fix this by using NUMA_NO_NODE, allowing UV hub info structures to be
allocated on valid nodes.

Fixes: 8a50c58519 ("x86/platform/uv: UV support for sub-NUMA clustering")
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/ab2BmGL0ehVkkjKk@hpe.com
2026-03-20 19:01:03 +01:00
Vladimir Oltean
63f8b60151 x86/entry/vdso: Fix path of included gettimeofday.c
Commit in Fixes forgot to convert one include path to be relative to the
kernel source directory after adding latter to flags-y.

Fix it.

  [ bp: Rewrite commit message. ]

Fixes: 693c819fed ("x86/entry/vdso: Refactor the vdso build")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20260307174406.1808981-1-vladimir.oltean@nxp.com
2026-03-20 17:56:52 +01:00
Linus Torvalds
c3d13784d5 hyperv-fixes for v7.0-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmm80s8THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXuD2B/9lj5tU54YNxRkB7iLjicrX6RfYTTC2
 xynQTHRQhAOEBF3woPR7BkTLVBvRyf10fplRP/nSxpAUAzdulz99rprBGlckEJji
 VlBhvuWz3fQ5XWsAovWaghhcSw9juf9k2x1x2VudwSLKcsgtqft96JQb4JxnuH02
 cNJLeprLuKRwafqF+Wmwj+o4uWEe78Sa7yDg6vai10+e2Q4sE+QrvyXqEJGFDdjj
 W4+aQmeAElvYTxKRGMrUsEaraif/BXNRmihCD96b9v8/w0BIZjPIVGvKE0NA4BKS
 rkPxs8KfD2Gh7dSvBvvnptd1+36XeyXsW+nFmndumwxqY2NKHuuA9fAt
 =Cqrd
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Wei Liu:

 - Fix ARM64 MSHV support (Anirudh Rayabharam)

 - Fix MSHV driver memory handling issues (Stanislav Kinsburskii)

 - Update maintainers for Hyper-V DRM driver (Saurabh Sengar)

 - Misc clean up in MSHV crashdump code (Ard Biesheuvel, Uros Bizjak)

 - Minor improvements to MSHV code (Mukesh R, Wei Liu)

 - Revert not yet released MSHV scrub partition hypercall (Wei Liu)

* tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  mshv: Fix error handling in mshv_region_pin
  MAINTAINERS: Update maintainers for Hyper-V DRM driver
  mshv: Fix use-after-free in mshv_map_user_memory error path
  mshv: pass struct mshv_user_mem_region by reference
  x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
  x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
  x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
  x86/hyperv: Use __naked attribute to fix stackless C function
  Revert "mshv: expose the scrub partition hypercall"
  mshv: add arm64 support for doorbell & intercept SINTs
  mshv: refactor synic init and cleanup
  x86/hyperv: print out reserved vectors in hexadecimal
2026-03-20 09:18:22 -07:00
Mike Rapoport (Microsoft)
217c0a5c17 x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free size
ranges_to_free array should have enough room to store the entire EFI
memmap plus an extra element for NULL entry.
The calculation of this array size wrongly adds 1 to the overall size
instead of adding 1 to the number of elements.

Add parentheses to properly size the array.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: a4b0bf6a40 ("x86/efi: defer freeing of boot services memory")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-03-20 15:31:10 +01:00
William Roche
201bc182ad x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
People do effort to inject MCEs into guests in order to simulate/test
handling of hardware errors. The real use case behind it is testing the
handling of SIGBUS which the memory failure code sends to the process.

If that process is QEMU, instead of killing the whole guest, the MCE can
be injected into the guest kernel so that latter can attempt proper
handling and kill the user *process*  in the guest, instead, which
caused the MCE. The assumption being here that the whole injection flow
can supply enough information that the guest kernel can pinpoint the
right process. But that's a different topic...

Regardless of virtualization or not, access to SMCA-specific registers
like MCA_DESTAT should only be done after having checked the smca
feature bit. And there are AMD machines like Bulldozer (the one before
Zen1) which do support deferred errors but are not SMCA machines.

Therefore, properly check the feature bit before accessing related MSRs.

  [ bp: Rewrite commit message. ]

Fixes: 7cb735d7c0 ("x86/mce: Unify AMD DFR handler with MCA Polling")
Signed-off-by: William Roche <william.roche@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com
2026-03-18 23:02:16 +01:00
Linus Torvalds
11e8c7e947 ARM:
- Correctly handle deeactivation of interrupts that were activated from
   LRs.  Since EOIcount only denotes deactivation of interrupts that
   are not present in an LR, start EOIcount deactivation walk *after*
   the last irq that made it into an LR.
 
 - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when
   pKVM is already enabled -- not only thhis isn't possible (pKVM
   will reject the call), but it is also useless: this can only
   happen for a CPU that has already booted once, and the capability
   will not change.
 
 - Fix a couple of low-severity bugs in our S2 fault handling path,
   affecting the recently introduced LS64 handling and the even more
   esoteric handling of hwpoison in a nested context
 
 - Address yet another syzkaller finding in the vgic initialisation,
   where we would end-up destroying an uninitialised vgic with nasty
   consequences
 
 - Address an annoying case of pKVM failing to boot when some of the
   memblock regions that the host is faulting in are not page-aligned
 
 - Inject some sanity in the NV stage-2 walker by checking the limits
   against the advertised PA size, and correctly report the resulting
   faults
 
 PPC:
 
 - Fix a PPC e500 build error due to a long-standing wart that was exposed by
   the recent conversion to kmalloc_obj(); rip out all the ugliness that
   led to the wart.
 
 RISC-V:
 
 - Prevent speculative out-of-bounds access using array_index_nospec()
   in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access,
   float register access, and PMU counter access
 
 - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
   kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()
 
 - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei()
 
 - Fix off-by-one array access in SBI PMU
 
 - Skip THP support check during dirty logging
 
 - Fix error code returned for Smstateen and Ssaia ONE_REG interface
 
 - Check host Ssaia extension when creating AIA irqchip
 
 x86:
 
 - Fix cases where CPUID mitigation features were incorrectly marked as
   available whenever the kernel used scattered feature words for them.
 
 - Validate _all_ GVAs, rather than just the first GVA, when processing
   a range of GVAs for Hyper-V's TLB flush hypercalls.
 
 - Fix a brown paper bug in add_atomic_switch_msr().
 
 - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
   to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu.
 
 - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local
   APIC (and AVIC is enabled at the module level).
 
 - Update CR8 write interception when AVIC is (de)activated, to fix a bug
   where the guest can run in perpetuity with the CR8 intercept enabled.
 
 - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow
   L1 hypervisors to set FREEZE_IN_SMM.  This reverts (by default) an
   unintentional tightening of userspace ABI in 6.17, and provides some
   amount of backwards compatibility with hypervisors who want to freeze
   PMCs on VM-Entry.
 
 - Validate the VMCS/VMCB on return to a nested guest from SMM, because
   either userspace or the guest could stash invalid values in memory
   and trigger the processor's consistency checks.
 
 Generic:
 
 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.
 
 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
 
 Selftests:
 
 - Increase the maximum number of NUMA nodes in the guest_memfd selftest to
   64 (from 8).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmy6n8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNX7ggAhWoCG+AE6P3yrp6Mi+nRYpeRGC3q
 q2IiZCn0UoCg6q3c2kgn7b/N2zLJs0Q8FZRCEp2Je+2uvptpmdp/BMEfiIU3n2/a
 61z+Dydbpyc+kUmhJzUJ+aotq5FnMNmAAmqSKoc19GhAx2OQhQmBP/JOZ0P/eqLE
 Is0qNBgr/Zms2ib3GFf/JT+urysL2mX47qe92HTzq1T9EEG0KleID0Jz8vYQI8Fr
 I5N9+lTxagQDi8ytwOM85Cn8K7wh+CQIgzmciHcVErpAvAWkrEjrPlQltpEz2C5B
 aWEcRgw46utEaAiwPQGJRW6TeoKUG0pUR3v6T90nBkjjJ1npm6gPVE6TBA==
 =7nQ9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Quite a large pull request, partly due to skipping last week and
  therefore having material from ~all submaintainers in this one. About
  a fourth of it is a new selftest, and a couple more changes are large
  in number of files touched (fixing a -Wflex-array-member-not-at-end
  compiler warning) or lines changed (reformatting of a table in the API
  documentation, thanks rST).

  But who am I kidding---it's a lot of commits and there are a lot of
  bugs being fixed here, some of them on the nastier side like the
  RISC-V ones.

  ARM:

   - Correctly handle deactivation of interrupts that were activated
     from LRs. Since EOIcount only denotes deactivation of interrupts
     that are not present in an LR, start EOIcount deactivation walk
     *after* the last irq that made it into an LR

   - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
     is already enabled -- not only thhis isn't possible (pKVM will
     reject the call), but it is also useless: this can only happen for
     a CPU that has already booted once, and the capability will not
     change

   - Fix a couple of low-severity bugs in our S2 fault handling path,
     affecting the recently introduced LS64 handling and the even more
     esoteric handling of hwpoison in a nested context

   - Address yet another syzkaller finding in the vgic initialisation,
     where we would end-up destroying an uninitialised vgic with nasty
     consequences

   - Address an annoying case of pKVM failing to boot when some of the
     memblock regions that the host is faulting in are not page-aligned

   - Inject some sanity in the NV stage-2 walker by checking the limits
     against the advertised PA size, and correctly report the resulting
     faults

  PPC:

   - Fix a PPC e500 build error due to a long-standing wart that was
     exposed by the recent conversion to kmalloc_obj(); rip out all the
     ugliness that led to the wart

  RISC-V:

   - Prevent speculative out-of-bounds access using array_index_nospec()
     in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
     access, float register access, and PMU counter access

   - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
     kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

   - Fix potential null pointer dereference in
     kvm_riscv_vcpu_aia_rmw_topei()

   - Fix off-by-one array access in SBI PMU

   - Skip THP support check during dirty logging

   - Fix error code returned for Smstateen and Ssaia ONE_REG interface

   - Check host Ssaia extension when creating AIA irqchip

  x86:

   - Fix cases where CPUID mitigation features were incorrectly marked
     as available whenever the kernel used scattered feature words for
     them

   - Validate _all_ GVAs, rather than just the first GVA, when
     processing a range of GVAs for Hyper-V's TLB flush hypercalls

   - Fix a brown paper bug in add_atomic_switch_msr()

   - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
     to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

   - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
     local APIC (and AVIC is enabled at the module level)

   - Update CR8 write interception when AVIC is (de)activated, to fix a
     bug where the guest can run in perpetuity with the CR8 intercept
     enabled

   - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
     allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
     default) an unintentional tightening of userspace ABI in 6.17, and
     provides some amount of backwards compatibility with hypervisors
     who want to freeze PMCs on VM-Entry

   - Validate the VMCS/VMCB on return to a nested guest from SMM,
     because either userspace or the guest could stash invalid values in
     memory and trigger the processor's consistency checks

  Generic:

   - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
     being unnecessary and confusing, triggered compiler warnings due to
     -Wflex-array-member-not-at-end

   - Document that vcpu->mutex is take outside of kvm->slots_lock and
     kvm->slots_arch_lock, which is intentional and desirable despite
     being rather unintuitive

  Selftests:

   - Increase the maximum number of NUMA nodes in the guest_memfd
     selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
  Documentation: kvm: fix formatting of the quirks table
  KVM: x86: clarify leave_smm() return value
  selftests: kvm: add a test that VMX validates controls on RSM
  selftests: kvm: extract common functionality out of smm_test.c
  KVM: SVM: check validity of VMCB controls when returning from SMM
  KVM: VMX: check validity of VMCS controls when returning from SMM
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
  KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
  KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
  KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
  KVM: x86: synthesize CPUID bits only if CPU capability is set
  KVM: PPC: e500: Rip out "struct tlbe_ref"
  KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
  KVM: selftests: Increase 'maxnode' for guest_memfd tests
  KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
  KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
  KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
  ...
2026-03-15 12:22:10 -07:00
David Woodhouse
2619da73bb KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
Commit 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with
flexible-array members") broke the userspace API for C++.

These structures ending in VLAs are typically a *header*, which can be
followed by an arbitrary number of entries. Userspace typically creates
a larger structure with some non-zero number of entries, for example in
QEMU's kvm_arch_get_supported_msr_feature():

    struct {
        struct kvm_msrs info;
        struct kvm_msr_entry entries[1];
    } msr_data = {};

While that works in C, it fails in C++ with an error like:
 flexible array member 'kvm_msrs::entries' not at end of 'struct msr_data'

Fix this by using __DECLARE_FLEX_ARRAY() for the VLA, which uses [0]
for C++ compilation.

Fixes: 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/3abaf6aefd6e5efeff3b860ac38421d9dec908db.camel@infradead.org
[sean: tag for stable@]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-12 10:56:10 -07:00
Dapeng Mi
e7fcc54524 perf/x86/intel: Fix OMR snoop information parsing issues
When omr_source is 0x2, the omr_snoop (bit[6]) and omr_promoted (bit[7])
fields are combined to represent the snoop information. However, the
omr_promoted field was not left-shifted by 1 bit, resulting in incorrect
snoop information.

Besides, the snoop information parsing is not accurate for some OMR
sources, like the snoop information should be SNOOP_NONE for these memory
access (omr_source >= 7) instead of SNOOP_HIT.

Fix these issues.

Closes: https://lore.kernel.org/all/CAP-5=fW4zLWFw1v38zCzB9-cseNSTTCtup=p2SDxZq7dPayVww@mail.gmail.com/
Fixes: d2bdcde962 ("perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/20260311075201.2951073-1-dapeng1.mi@linux.intel.com
2026-03-12 11:29:16 +01:00
Dapeng Mi
1d07bbd7ea perf/x86/intel: Add missing branch counters constraint apply
When running the command:
'perf record -e "{instructions,instructions:p}" -j any,counter sleep 1',
a "shift-out-of-bounds" warning is reported on CWF.

  UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
  shift exponent 64 is too large for 64-bit type 'long long unsigned int'
  ......
  intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
  intel_pmu_lbr_save_brstack+0xc0/0x4c0
  setup_arch_pebs_sample_data+0x114b/0x2400

The warning occurs because the second "instructions:p" event, which
involves branch counters sampling, is incorrectly programmed to fixed
counter 0 instead of the general-purpose (GP) counters 0-3 that support
branch counters sampling. Currently only GP counters 0-3 support branch
counters sampling on CWF, any event involving branch counters sampling
should be programed on GP counters 0-3. Since the counter index of fixed
counter 0 is 32, it leads to the "src" value in below code is right
shifted 64 bits and trigger the "shift-out-of-bounds" warning.

cnt = (src >> (order[j] * LBR_INFO_BR_CNTR_BITS)) & LBR_INFO_BR_CNTR_MASK;

The root cause is the loss of the branch counters constraint for the
new event in the branch counters sampling event group. Since it isn't
yet part of the sibling list. This results in the second
"instructions:p" event being programmed on fixed counter 0 incorrectly
instead of the appropriate GP counters 0-3.

To address this, we apply the missing branch counters constraint for
the last event in the group. Additionally, we introduce a new function,
`intel_set_branch_counter_constr()`, to apply the branch counters
constraint and avoid code duplication.

Fixes: 3374491619 ("perf/x86/intel: Support branch counters logging")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260228053320.140406-2-dapeng1.mi@linux.intel.com
Cc: stable@vger.kernel.org
2026-03-12 11:29:16 +01:00
Peter Zijlstra
f1cac6ac62 x86/perf: Make sure to program the counter value for stopped events on migration
Both Mi Dapeng and Ian Rogers noted that not everything that sets HES_STOPPED
is required to EF_UPDATE. Specifically the 'step 1' loop of rescheduling
explicitly does EF_UPDATE to ensure the counter value is read.

However, then 'step 2' simply leaves the new counter uninitialized when
HES_STOPPED, even though, as noted above, the thing that stopped them might not
be aware it needs to EF_RELOAD -- since it didn't EF_UPDATE on stop.

One such location that is affected is throttling, throttle does pmu->stop(, 0);
and unthrottle does pmu->start(, 0); possibly restarting an uninitialized counter.

Fixes: a4eaf7f146 ("perf: Rework the PMU methods")
Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260311204035.GX606826@noisy.programming.kicks-ass.net
2026-03-12 11:29:15 +01:00
Breno Leitao
8d5fae6011 perf/x86: Move event pointer setup earlier in x86_pmu_enable()
A production AMD EPYC system crashed with a NULL pointer dereference
in the PMU NMI handler:

  BUG: kernel NULL pointer dereference, address: 0000000000000198
  RIP: x86_perf_event_update+0xc/0xa0
  Call Trace:
   <NMI>
   amd_pmu_v2_handle_irq+0x1a6/0x390
   perf_event_nmi_handler+0x24/0x40

The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
corresponding to the `if (unlikely(!hwc->event_base))` check in
x86_perf_event_update() where hwc = &event->hw and event is NULL.

drgn inspection of the vmcore on CPU 106 showed a mismatch between
cpuc->active_mask and cpuc->events[]:

  active_mask: 0x1e (bits 1, 2, 3, 4)
  events[1]:   0xff1100136cbd4f38  (valid)
  events[2]:   0x0                 (NULL, but active_mask bit 2 set)
  events[3]:   0xff1100076fd2cf38  (valid)
  events[4]:   0xff1100079e990a90  (valid)

The event that should occupy events[2] was found in event_list[2]
with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
(which clears hw.state and sets active_mask) but events[2] was
never populated.

Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
showing it was stopped when the PMU rescheduled events, confirming the
throttle-then-reschedule sequence occurred.

The root cause is commit 7e772a93eb ("perf/x86: Fix NULL event access
and potential PEBS record loss") which moved the cpuc->events[idx]
assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
after the PERF_HES_ARCH check. This broke any path that calls
pmu->start() without going through x86_pmu_enable() -- specifically
the unthrottle path:

  perf_adjust_freq_unthr_events()
    -> perf_event_unthrottle_group()
      -> perf_event_unthrottle()
        -> event->pmu->start(event, 0)
          -> x86_pmu_start()     // sets active_mask but not events[]

The race sequence is:

  1. A group of perf events overflows, triggering group throttle via
     perf_event_throttle_group(). All events are stopped: active_mask
     bits cleared, events[] preserved (x86_pmu_stop no longer clears
     events[] after commit 7e772a93eb).

  2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
     due to other scheduling activity. Stopped events that need to
     move counters get PERF_HES_ARCH set and events[old_idx] cleared.
     In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
     to be skipped -- events[new_idx] is never set.

  3. The timer tick unthrottles the group via pmu->start(). Since
     commit 7e772a93eb removed the events[] assignment from
     x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
     remains NULL.

  4. A PMC overflow NMI fires. The handler iterates active counters,
     finds active_mask[2] set, reads events[2] which is NULL, and
     crashes dereferencing it.

Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
before the PERF_HES_ARCH check, so that events[] is populated even
for events that are not immediately started. This ensures the
unthrottle path via pmu->start() always finds a valid event pointer.

Fixes: 7e772a93eb ("perf/x86: Fix NULL event access and potential PEBS record loss")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310-perf-v2-1-4a3156fce43c@debian.org
2026-03-12 11:29:15 +01:00
Uros Bizjak
afeb96cb18 x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
hv_hvcrash_ctxt_save() in arch/x86/hyperv/hv_crash.c currently saves %cr2
and %cr8 using %eax ("=a"). This unnecessarily forces a specific register.
Update the inline assembly to use a general-purpose register ("=r") for
both %cr2 and %cr8. This makes the code more flexible for the compiler
while producing the same saved context contents.

No functional changes.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-12 04:25:20 +00:00
Uros Bizjak
2536091d58 x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
Use current_stack_pointer to avoid asm() when saving %rsp to the
crash context memory in hv_hvcrash_ctxt_save(). The new code is
more readable and results in exactly the same object file.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-12 04:25:19 +00:00
Uros Bizjak
3484127c19 x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
hv_hvcrash_ctxt_save() in arch/x86/hyperv/hv_crash.c currently saves
segment registers via a general-purpose register (%eax). Update the
code to save segment registers (cs, ss, ds, es, fs, gs) directly to
the crash context memory using movw. This avoids unnecessary use of
a general-purpose register, making the code simpler and more efficient.

The size of the corresponding object file improves as follows:

   text    data     bss     dec     hex filename
   4167     176     200    4543    11bf hv_crash-old.o
   4151     176     200    4527    11af hv_crash-new.o

No functional change occurs to the saved context contents; this is
purely a code-quality improvement.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-12 04:25:19 +00:00
Paolo Bonzini
6b1ca262a9 KVM: x86: clarify leave_smm() return value
The return value of vmx_leave_smm() is unrelated from that of
nested_vmx_enter_non_root_mode().  Check explicitly for success
(which happens to be 0) and return 1 just like everywhere
else in vmx_leave_smm().

Likewise, in svm_leave_smm() return 0/1 instead of the 0/1/-errno
returned by tenter_svm_guest_mode().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:12 +01:00
Paolo Bonzini
be5fa8737d KVM: SVM: check validity of VMCB controls when returning from SMM
The VMCB12 is stored in guest memory and can be mangled while in SMM; it
is then reloaded by svm_leave_smm(), but it is not checked again for
validity.

Move the cached vmcb12 control and save consistency checks out of
svm_set_nested_state() and into a helper, and reuse it in
svm_leave_smm().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Paolo Bonzini
5a30e8aea0 KVM: VMX: check validity of VMCS controls when returning from SMM
The VMCS12 is not available while in SMM.  However, it can be overwritten
if userspace manages to trigger copy_enlightened_to_vmcs12() - for example
via KVM_GET_NESTED_STATE.

Because of this, the VMCS12 has to be checked for validity before it is
used to generate the VMCS02.  Move the check code out of vmx_set_nested_state()
(the other "not a VMLAUNCH/VMRESUME" path that emulates a nested vmentry)
and reuse it in vmx_leave_smm().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Sean Christopherson
87d0f901a9 KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
Explicitly set/clear CR8 write interception when AVIC is (de)activated to
fix a bug where KVM leaves the interception enabled after AVIC is
activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
will remain intercepted in perpetuity.

On its own, the dangling CR8 intercept is "just" a performance issue, but
combined with the TPR sync bug fixed by commit d02e48830e ("KVM: SVM:
Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
intercept is fatal to Windows guests as the TPR seen by hardware gets
wildly out of sync with reality.

Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
KVM's world.  I.e. there's no need to trigger update_cr8_intercept(), this
is firmly an SVM implementation flaw/detail.

WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
never enter the guest with AVIC enabled and CR8 writes intercepted.

Fixes: 3bbf3565f4 ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Cc: Naveen N Rao (AMD) <naveen@kernel.org>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20260203190711.458413-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Squash fix to avic_deactivate_vmcb. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Sean Christopherson
3989a6d036 KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
vCPU could activate AVIC at any point in its lifecycle.  Configuring the
VMCB if and only if AVIC is active "works" purely because of optimizations
in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
*and* to defer updates until the first KVM_RUN.  In quotes because KVM
likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
a vCPU is created while APICv is inhibited at the VM level for whatever
reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
vendor code due to seeing "apicv_active == activate".

Cleaning up the initialization code will also allow fixing a bug where KVM
incorrectly leaves CR8 interception enabled when AVIC is activated without
creating a mess with respect to whether AVIC is activated or not.

Cc: stable@vger.kernel.org
Fixes: 67034bb9dd ("KVM: SVM: Add irqchip_split() checks before enabling AVIC")
Fixes: 6c3e4422dd ("svm: Add support for dynamic APICv")
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20260203190711.458413-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Jim Mattson
e2ffe85b6d KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
Add KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM to allow L1 to set
FREEZE_IN_SMM in vmcs12's GUEST_IA32_DEBUGCTL field, as permitted
prior to commit 6b1dd26544 ("KVM: VMX: Preserve host's
DEBUGCTLMSR_FREEZE_IN_SMM while running the guest").  Enable the quirk
by default for backwards compatibility (like all quirks); userspace
can disable it via KVM_CAP_DISABLE_QUIRKS2 for consistency with the
constraints on WRMSR(IA32_DEBUGCTL).

Note that the quirk only bypasses the consistency check.  The vmcs02 bit is
still owned by the host, and PMCs are not frozen during virtualized SMM.
In particular, if a host administrator decides that PMCs should not be
frozen during physical SMM, then L1 has no say in the matter.

Fixes: 095686e6fc ("KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter")
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20260205231537.1278753-1-jmattson@google.com
[sean: tag for stable@, clean-up and fix goofs in the comment and docs]
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Rename quirk. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Li RongQing
b54e4707a6 KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
The mask_notifier_list is protected by kvm->irq_srcu, but the traversal
in kvm_fire_mask_notifiers() incorrectly uses hlist_for_each_entry_rcu().
This leads to lockdep warnings because the standard RCU iterator expects
to be under rcu_read_lock(), not SRCU.

Replace the RCU variant with hlist_for_each_entry_srcu() and provide
the proper srcu_read_lock_held() annotation to ensure correct
synchronization and silence lockdep.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://patch.msgid.link/20260204091206.2617-1-lirongqing@baidu.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Namhyung Kim
f78e627a01 KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
The previous change had a bug to update a guest MSR with a host value.

Fixes: c3d6a7210a ("KVM: VMX: Dedup code for adding MSR to VMCS's auto list")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260220220216.389475-1-namhyung@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Manuel Andreas
a5264387c2 KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
In KVM guests with Hyper-V hypercalls enabled, the hypercalls
HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST and HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
allow a guest to request invalidation of portions of a virtual TLB.
For this, the hypercall parameter includes a list of GVAs that are supposed
to be invalidated.

Currently, only the base GVA is checked to be canonical. In reality, this
check needs to be performed for the entire range of GVAs, as checking only
the base GVA enables guests running on Intel hardware to trigger a
WARN_ONCE in the host (see Fixes commit below).

Move the check for non-canonical addresses to be performed for every GVA
of the supplied range to avoid the splat, and to be more in line with the
Hyper-V specification, since, although unlikely, a range starting with an
invalid GVA may still contain GVAs that are valid.

Fixes: fa787ac07b ("KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush")
Signed-off-by: Manuel Andreas <manuel.andreas@tum.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://patch.msgid.link/00a7a31b-573b-4d92-91f8-7d7e2f88ea48@tum.de
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Carlos López
4b3b8a8b0d KVM: x86: synthesize CPUID bits only if CPU capability is set
KVM incorrectly synthesizes CPUID bits for KVM-only leaves, as the
following branch in kvm_cpu_cap_init() is never taken:

    if (leaf < NCAPINTS)
        kvm_cpu_caps[leaf] &= kernel_cpu_caps[leaf];

This means that bits set via SYNTHESIZED_F() for KVM-only leaves are
unconditionally set. This for example can cause issues for SEV-SNP
guests running on Family 19h CPUs, as TSA_SQ_NO and TSA_L1_NO are
always enabled by KVM in 80000021[ECX]. When userspace issues a
SNP_LAUNCH_UPDATE command to update the CPUID page for the guest, SNP
firmware will explicitly reject the command if the page sets sets these
bits on vulnerable CPUs.

To fix this, check in SYNTHESIZED_F() that the corresponding X86
capability is set before adding it to to kvm_cpu_cap_features.

Fixes: 31272abd59 ("KVM: SVM: Advertise TSA CPUID bits to guests")
Link: https://lore.kernel.org/all/20260208164233.30405-1-clopez@suse.de/
Signed-off-by: Carlos López <clopez@suse.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://patch.msgid.link/20260209153108.70667-2-clopez@suse.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-03-11 18:41:11 +01:00
Paolo Bonzini
94fe3e6515 KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
    unnecessary and confusing, triggered compiler warnings due to
    -Wflex-array-member-not-at-end.
 
  - Document that vcpu->mutex is take outside of kvm->slots_lock and
    kvm->slots_arch_lock, which is intentional and desirable despite being
    rather unintuitive.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmp19MACgkQOlYIJqCj
 N/02KA//e7D1DqCcDC46tMyLI+/Q6Wy0F40nXp0tTzJ+gRT5QesEw3jSQdXCRmPV
 yTFLyDaGYD2jqV+EpJLPYBT41oU2FXsjD5NFJRAISD5KPIJbACHvJUxWGYWLvaLU
 iMlwhqZimXKUFAECW2QpwLV8BQenyOEj5dVeKYdPjX6seIEeFlK6JAdteLK0g9gR
 gksE+9QzCFXt0cRfgkaA4UKcA+xWb3ThKMej1AadB6dGF7ezkMvyyQynGLB2N19L
 LZRpOXr70ypyaihC553Msgi4vrpVTPN2BjLrsudGN/IJv6QbdAz5jTU8Lwu9R5QT
 y9LiEPfdMT7WmIBxnH6V7HO5OoN8V2rGJpB/a3KvKO73QjhJJqNyqB6LDPqEbHyw
 AmhQCuQ8Pn1RLKQDXdKll+aI19vi7aOVpq67ii+I9xbzHgg5+uAzKr8hkPAibnVw
 KPGYqgYQa5j3jyRq6jRkAZSkEKZ9PoM8LMiqgnNW1ZrlrDqsPajKaegXODfLuvGf
 yLYtfXbZLMAIAM32YeIH0LrcAT7SEPUFkoh85IB2YOk0mfU1PxqrXOVTPh1GkY2Q
 bKH16T9S4zCfB20V+NYCn+juX4uCNb56b7/jbjI0Ueu/AGv/ITHwRrlhQvXuGSvN
 A65w+LSWlcgRQwLglCPpX308A4DcGCPcY4RvzoirBG+WWNn/Aj4=
 =bD3g
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 7.0

 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.

 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
2026-03-11 18:01:55 +01:00