linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-29 17:43:52 +02:00

Author	SHA1	Message	Date
Linus Torvalds	60b8d4d492	- Change the SEV host code handling of when SNP gets enabled in order to allow the machine to claim SNP-related resources only when SNP guests are really going to be launched. The user requests this by loading the ccp module and thus it controls when SNP initialization is done So export an API which module code can call and do the necessary SNP setup only when really needed - Drop an unnecessary write-back and invalidate operation that was being performed too early, since the ccp driver already issues its own at the correct point in the initialization sequence — Drop the hotplug callbacks for enabling SNP on newly onlined CPUs, which were both architecturally unsound (the firmware rejects initialization if any CPU lacks the required configuration) and buggy (the MFDM SYSCFG MSR bit was not being set) - Code refactoring and cleanups to accomplish the above -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmndWHYACgkQEsHwGGHe VUrp5w/+KtaEmeIJ8w5tkZ1haY6/FaG0mPKkBVoIEbt4TR2p7JmdCDS0yVi88+Ze GnsmKbesinhMzH3SFcQWOHu17c5BYyu8HllDhKqvAalwp0BZXSsz0LruJJC9vyqz 6J89iGPhvFDlV3aE8gSAVNu8bzfQwoiAGS4C8QXVnerUCuGMqrM31dAyroqmRw+Y QNeSqm2OW3hkisSZPga8euo7e9iwGuXufhR/mxumrLY3k7mK0U9tDqMA9XaHqfLI TssalBrhdBJkf62Cj5gByVOlVVq6E4ii3xR0Pbs35BkXchdCM+ni89BcdJTrwzcr fcmsc0HXZud5ZzTi4BYPyEUuJbNshj2EuhVTaLbIGL0kPO/MNJ1xrGfAF1NrkqmD ATH5B5uCCwUxTJJc0Fwj4McVpW/TCqXP/QDEpMoN8yQX8yejQBGRcTKVIEzyGrze GziQ+Oem/MLTBz97r8p/CXR6ADDdqIxiO1WF1hDpYoDQqtdE+DeXSscLvSH+f4C+ 6x3EjrfMD8tNrf9JYE/aRdoF72q5GlWSh06RWlQrv2A+OezQCe82yMAPiUqaS1HS GsTJtoeKVDZNW2EfIl4HNzf0PemhD8JlJSRV/euaW6ipl1gDm4YasmYyhn2pL8WI Q0I9Ud2pnDmYFjxkfPmMCsnwkdSPCXlQ7/EeclYswWAKnoDrOAc= =2eQs -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Change the SEV host code handling of when SNP gets enabled in order to allow the machine to claim SNP-related resources only when SNP guests are really going to be launched. The user requests this by loading the ccp module and thus it controls when SNP initialization is done So export an API which module code can call and do the necessary SNP setup only when really needed - Drop an unnecessary write-back and invalidate operation that was being performed too early, since the ccp driver already issues its own at the correct point in the initialization sequence - Drop the hotplug callbacks for enabling SNP on newly onlined CPUs, which were both architecturally unsound (the firmware rejects initialization if any CPU lacks the required configuration) and buggy (the MFDM SYSCFG MSR bit was not being set) - Code refactoring and cleanups to accomplish the above * tag 'x86_sev_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: crypto/ccp: Update HV_FIXED page states to allow freeing of memory crypto/ccp: Implement SNP x86 shutdown x86/sev, crypto/ccp: Move HSAVE_PA setup to arch/x86/ x86/sev, crypto/ccp: Move SNP init to ccp driver x86/sev: Create snp_shutdown() x86/sev: Create snp_prepare() x86/sev: Create a function to clear/zero the RMP x86/sev: Rename SNP_FEATURES_PRESENT to SNP_FEATURES_IMPL x86/virt/sev: Keep the RMP table bookkeeping area mapped x86/virt/sev: Drop WBINVD before setting MSR_AMD64_SYSCFG_SNP_EN x86/virt/sev: Drop support for SNP hotplug	2026-04-14 15:20:54 -07:00
Linus Torvalds	5d0d362330	Kbuild/Kconfig updates for 7.1 Kbuild changes ============== * tools/build: Reject unexpected values for LLVM= * kbuild: uapi: remove usage of toolchain headers * kbuild: Switch from '-fms-extensions' to '-fms-anonymous-structs' when available (currently: clang >= 23.0.0) * kbuild: Reduce the number of compiler-generated suffixes for clang thin-lto build * kbuild: reduce output spam ("GEN Makefile") when building out of tree * check-uapi: improve portability for testing headers * uapi: also test UAPI headers against C++ compilers * kbuild: vdso_install: drop build ID architecture allow-list * checksyscalls: only run when necessary * Documentation: kbuild: Update the debug information notes in reproducible-builds.rst * kconfig: forbid multiple entries with the same symbol in a choice * kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain Kconfig changes =============== * kconfig: Error out on duplicated kconfig inclusion Cc: Alexander Coffin <alex@cyberialabs.net> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bill Wendling <morbo@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Dodji Seketeli <dodji@seketeli.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Helge Deller <deller@gmx.de> Cc: John Moon <john@jmoon.dev> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Song Liu <song@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: kernel-team@fb.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: llvm@lists.linux.dev Cc: loongarch@lists.linux.dev -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEh0E3p4c3JKeBvsLGB1IKcBYmEmkFAmnatXEACgkQB1IKcBYm Eml6ww/9Hja/CTBoF+ZgMXN/9VcQhzNonPXIp8IGarX3+LCPh8RfUEywaOLnvR/U fE6FEIcwDw0M5drS0hEH7t1Xowc6AhDX05lKBj3aGBgn6JqGGQFAfnysQd5z0cwW Y/8+bMm+Y2XQ/xZNa0J92+3evPO04U7+2kCSVD051ZhRdmK4n290u4YsTgoKs7Fm 1SBIr+tsFa1zMOG6r+J4uCLxXNnujQ5XcejnlmdBM0o19f9kttvVkYKuBVdXPHf4 JaTLti22Td8SklDKMmkSRg+Ul/Wh2x8D8tP98VQAJe5B3f4Uk6YAu1BMrbQaX5Rk 5SsGbhBEeOTDc4qCaS8DS+FJQU6T9W9cf/9+tBY510fXxAIonz5cPB06q5xeJWCd IkVB3KpmaVxo2B54Cy4b/fvd1J3VMkmFjBQWMNwkq6cnCG1ZK/b6Jmvh9BQSNctl IYJxWKBjlddrMuvZEMI0CewVq4GmarTLiOpweghDg8OYqya4E6PfOUGnaWMrWT5c 2E8ZMnQSb68yFUaXK+Sy+Pw2Nig/VvxCUxHdaarHi/RmGeoN5dMGfjj/gGZvZrHt NUGt6qe+X62P0ZAUR8p+GpRcU3+p3uLhCyO7dkwqgLVZTnaXy5XtUQ/uyh2G60hv eJlFfrn8QXplvzrxcSTJya6PunoIhuWh2BfKhf0RDymJTPyMbBc= =+wTC -----END PGP SIGNATURE----- Merge tag 'kbuild-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild/Kconfig updates from Nicolas Schier: "Kbuild: - reject unexpected values for LLVM= - uapi: remove usage of toolchain headers - switch from '-fms-extensions' to '-fms-anonymous-structs' when available (currently: clang >= 23.0.0) - reduce the number of compiler-generated suffixes for clang thin-lto build - reduce output spam ("GEN Makefile") when building out of tree - improve portability for testing headers - also test UAPI headers against C++ compilers - drop build ID architecture allow-list in vdso_install - only run checksyscalls when necessary - update the debug information notes in reproducible-builds.rst - expand inlining hints with -fdiagnostics-show-inlining-chain Kconfig: - forbid multiple entries with the same symbol in a choice - error out on duplicated kconfig inclusion" * tag 'kbuild-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (35 commits) kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain kconfig: forbid multiple entries with the same symbol in a choice Documentation: kbuild: Update the debug information notes in reproducible-builds.rst checksyscalls: move instance functionality into generic code checksyscalls: only run when necessary checksyscalls: fail on all intermediate errors checksyscalls: move path to reference table to a variable kbuild: vdso_install: drop build ID architecture allow-list kbuild: vdso_install: gracefully handle images without build ID kbuild: vdso_install: hide readelf warnings kbuild: vdso_install: split out the readelf invocation kbuild: uapi: also test UAPI headers against C++ compilers kbuild: uapi: provide a C++ compatible dummy definition of NULL kbuild: uapi: handle UML in architecture-specific exclusion lists kbuild: uapi: move all include path flags together kbuild: uapi: move some compiler arguments out of the command definition check-uapi: use dummy libc includes check-uapi: honor ${CROSS_COMPILE} setting check-uapi: link into shared objects kbuild: reduce output spam when building out of tree ...	2026-04-14 09:18:40 -07:00
Kim Phillips	531397a803	x86/sev: Rename SNP_FEATURES_PRESENT to SNP_FEATURES_IMPL Rename SNP_FEATURES_PRESENT to SNP_FEATURES_IMPL to denote its counterpart relationship with SNP_FEATURES_IMPL_REQ. [ bp: Drop stable@, massage commit message. ] Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260203222405.4065706-4-kim.phillips@amd.com	2026-03-16 21:08:50 +01:00
Nathan Chancellor	ec4c28276c	kbuild: Consolidate C dialect options Introduce CC_FLAGS_DIALECT to make it easier to update the various places in the tree that rely on the GNU C standard and Microsoft extensions flags atomically. All remaining uses of '-std=gnu11' and '-fms-extensions' are in the tools directory (which has its own build system) and other standalone Makefiles. This will allow the kernel to use a narrower option to enable the Microsoft anonymous tagged structure extension in a simpler manner. Place the CC_FLAGS_DIALECT block after the configuration include (so that a future change can move the selection of the flag to Kconfig) but before the arch/$(SRCARCH)/Makefile include (so that CC_FLAGS_DIALECT is available for use in those Makefiles). Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Link: https://patch.msgid.link/20260223-fms-anonymous-structs-v1-1-8ee406d3c36c@kernel.org Signed-off-by: Nicolas Schier <nsc@kernel.org>	2026-03-12 12:52:37 +01:00
Linus Torvalds	c23719abc3	Miscellaneous x86 fixes: - Fix SEV guest boot failures in certain circumstances, due to very early code relying on a BSS-zeroed variable that isn't actually zeroed yet an may contain non-zero bootup values. Move the variable into the .data section go gain even earlier zeroing. - Expose & allow the IBPB-on-Entry feature on SNP guests, which was not properly exposed to guests due to initial implementational caution. - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative file paths. - 4 commits to fix the various SNC (Sub-NUMA Clustering) topology enumeration bugs/artifacts (sched-domain build errors mostly). SNC enumeration data got more complicated with Granite Rapids X (GNR) and Clearwater Forest X (CWF), which exposed these bugs and made their effects more serious. - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs. - Work around a historic libgcc unwinder bug in the vdso32 sigreturn code (again), which regressed during an overly aggressive recent cleanup of DWARF annotations. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsy9wRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ieiQ/7B2Rfm5vR5rQLlAv26iEMypIwoCiCMgzA YD3nOMFl6aGhphKryiU0b4MDhAIASN9X6mZloryUKyol1oKP0evkWXSk/0J+k+V9 lS7uIVL+8nPTSl3gQE7ARzJ9jakFN49VzDheZjsjIHC0+n+yvCJU6xSx8IKeiTSW axpX8R33M3Fj+u5anF3m37OdFTgiYxFO0t5VNFgWP4H9367yC/wnHPuDyidAdJ/N B7PL1L3rG3+w/4np81Xwi/rThwgsSWarVLNuMJuGM5wujMr8mQGhuWaeLiPgTx7G wze1iarWvp5uqamGztpy/4WMD1x0yBX9CCSocnwF48Fh1yTww5+uwOZn5e5fZxYr vDhCH6+DB8Rt3Wj+/3RBzHSFe7rNq+f86U84uxTwyOs5eC5sGUuyH15lCt4dP9ZO uQfW0dQRwvUXCGXJxxZdIR0nq/vEJUmQ+DLLL6zkCj24t9ND5IPAkBLVn7P5PO5s qv8dPpldSq57V4comqW8oDAqLL0OeS1qgggxlHzqAdrMmt+IVKWvteRXrkgy1m9Y Bt0EbdghUTZkn9+FcUTorVA/pZHL5sYCiuGQxNbaaLmMWrcX4I3XnEtpzgukHh8e BL1blJWAm/4cuhGXb4RF7AZMQgTU56greOU385Afc1Qz2lzohGO4lqgGOH8L0ZEh KqEX1IS0ZbI= =KlDX -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix SEV guest boot failures in certain circumstances, due to very early code relying on a BSS-zeroed variable that isn't actually zeroed yet an may contain non-zero bootup values Move the variable into the .data section go gain even earlier zeroing - Expose & allow the IBPB-on-Entry feature on SNP guests, which was not properly exposed to guests due to initial implementational caution - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative file paths - Fix the various SNC (Sub-NUMA Clustering) topology enumeration bugs/artifacts (sched-domain build errors mostly). SNC enumeration data got more complicated with Granite Rapids X (GNR) and Clearwater Forest X (CWF), which exposed these bugs and made their effects more serious - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs - Work around a historic libgcc unwinder bug in the vdso32 sigreturn code (again), which regressed during an overly aggressive recent cleanup of DWARF annotations * tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/vdso32: Work around libgcc unwinder bug x86/resctrl: Fix SNC detection x86/topo: Fix SNC topology mess x86/topo: Replace x86_has_numa_in_package x86/topo: Add topology_num_nodes_per_package() x86/numa: Store extra copy of numa_nodes_parsed x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths x86/sev: Allow IBPB-on-Entry feature for SNP guests x86/boot/sev: Move SEV decompressor variables into the .data section	2026-03-07 17:12:06 -08:00
Jan Stancek	3d1973a0c7	x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different output directory (O=) the build currently fails because it can't find the filename set in CONFIG_EFI_SBAT_FILE: arch/x86/boot/compressed/sbat.S: Assembler messages: arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat Add $(srctree) as include dir for sbat.o. [ bp: Massage commit message. ] Fixes: `61b57d3539` ("x86/efi: Implement support for embedding SBAT data for x86") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com	2026-03-04 11:51:08 +01:00
Kim Phillips	9073428bb2	x86/sev: Allow IBPB-on-Entry feature for SNP guests The SEV-SNP IBPB-on-Entry feature does not require a guest-side implementation. It was added in Zen5 h/w, after the first SNP Zen implementation, and thus was not accounted for when the initial set of SNP features were added to the kernel. In its abundant precaution, commit `8c29f01654` ("x86/sev: Add SEV-SNP guest feature negotiation support") included SEV_STATUS' IBPB-on-Entry bit as a reserved bit, thereby masking guests from using the feature. Allow guests to make use of IBPB-on-Entry when supported by the hypervisor, as the bit is now architecturally defined and safe to expose. Fixes: `8c29f01654` ("x86/sev: Add SEV-SNP guest feature negotiation support") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20260203222405.4065706-2-kim.phillips@amd.com	2026-03-02 11:08:59 +01:00
Tom Lendacky	4ca191cec1	x86/boot/sev: Move SEV decompressor variables into the .data section As part of the work to remove the dependency on calling into the decompressor code (startup_64()) for a UEFI boot, a call to rmpadjust() was removed from sev_enable() in favor of checking the value of the snp_vmpl variable. When booting through a non-UEFI path and calling startup_64(), the call to sev_enable() is performed before the BSS section is zeroed. With the removal of the rmpadjust() call and the corresponding check of the return code, the snp_vmpl variable is checked. Since the kernel is running at VMPL0, the snp_vmpl variable will not have been set and should be the default value of 0. However, since the call occurs before the BSS is zeroed, the snp_vmpl variable may not actually be zero, which will cause the guest boot to fail. Since the decompressor relocates itself, the BSS would need to be cleared both before and after the relocation, but this would, in effect, cause all of the changes to BSS variables before relocation to be lost after relocation. Instead, move the snp_vmpl variable into the .data section so that it is initialized and the value made safe during relocation. As a pre-caution against future changes, move other SEV-related decompressor variables into the .data section, too. Fixes: `68a501d7fd` ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Tested-by: Kevin Hui <kevinhui@meta.com> Tested-by: Changyuan Lyu <changyuanl@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@amd.com	2026-03-02 11:08:33 +01:00
Nathan Chancellor	8678591b47	kbuild: Split .modinfo out from ELF_DETAILS Commit `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit `ddc6cbef3e` ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit `d50f210913` ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2026-02-26 11:50:19 -07:00
Kevin Brodsky	7303ecbfe4	mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE. In preparation for introducing a generic lazy_mmu layer that will require storage in task_struct, let's switch to a cleaner approach: instead of defining a macro, select a CONFIG option. This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch select it when it implements lazy_mmu helpers. __HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on the new CONFIG instead. On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected. This creates some complications in arch/x86/boot/, because a few files manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does not define the lazy_mmu helpers, but this breaks the build as <linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. There does not seem to be a clean way out of this - let's just undefine that new CONFIG too. Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:33 -08:00
Linus Torvalds	c76431e3b5	- Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkt8McACgkQEsHwGGHe VUpxJBAAg6PaKVNOmceCCcwDb331YLHpd18eeLy7Cdr6ktcdDflo39TiKnwy/BEs 2uENe9OrS52JL98vMhZxPVFL/3yplrMo7jfuamthSEcFuvlxe2wh7NGhxbNl2gOe +9BpYTbHe5wts+W+ij/srcBCzGDIYoYhh7Dbc8wB1dh/jcH2qkEnYvBTGoYtgELF lWt1pWsdHVnUORn9qKNI3iAX47jmkUTBqEgQHyPFcSM6s8WGtIOKib7+UtvNiMTw V0ZMzfsL5k4J6ifwR5PLLaMNXdwQoZeArWbCA6VYhnOEP0MBmgLxFFCCi5z6iGwv ph+YYWm2/kMEOdJDfDlZqjZFcw/QOfk44chGMTqf+G3rFdNrHMdTiovtvzg6vGvG akJK5r2JsAJu8ymuwd3Rke3F3k1SP7QfdYB1Tipu4wvt7iSOQNqIA/xcHjMprHBx MZ6BifOxwXhhihUr9UA0TSQM6fJfnzrKPdzDSh/h5qThSpjbH/qkNlJwNGy/Knm5 5MTftDkDkpkmJDiOhJAOCweMBGNyFQrOH1QYuqURrB+AGo3Iq9HIJ+2fVXtUdIZy AMmvEROjMRgxD2hoBCVa4AF5Gm3cNiGxGn+jEitdLgqVbTi0tWSO+oPOK+uH2Zib 77r8hNmd9hE7ikHSRGhWS+5D3mVWejsDtrs8YyCMuXN/Ft2omRU= =+p5z -----END PGP SIGNATURE----- Merge tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions * tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Fix page table access in 5-level to 4-level paging transition x86/boot: Fix page table access in 5-level to 4-level paging transition x86/mm: Unify __phys_addr_symbol()	2025-12-02 13:32:52 -08:00
Linus Torvalds	cb502f0e5e	- Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkttt0ACgkQEsHwGGHe VUoaFg/8CY+UAE1VtnaWhG7hpxCqBLlVtyt3gVhIn6ZCZ5mxtFoEcZI8BnxnFbbM Rpd+5LsMbu4GWfq/AEx/a+IT3rKZIT1RjRrd73JZZYRNpv6/Vnmv/7OizjDbBqhU n3Q1rHJCaVk90oP2sbB4dr9qsYHOx624jz0CrBxUM7GnaCQwtofqa6hK6HMJDu3g OLjJCoaHY0ry779QUjCmMJ1BbOLsy1fGsmuQO8LcE6xWRJv5ueJPcZbH0I0g5UIF NExe03uxaSvrM0ZYdjHpQU590kyPwjzo0Jx8IANQDb0dyY4mIFPdnZwbRBr/OPnZ 205c0EllHZvDZ4nKTYfeJYjXnPWmovHXJATr/BuqW+0GQZYmTbDq2IgvbbnE9gs1 67Sy94ISuxs683hNb9U2cLI7OFHcVDGfESuHhmeJTQsVY+VazL00p6azFP1ONpsn N93GYK+ZFvOeFFssO/gm97jbkKyUH9PS2+TEbhijeQkZ/PYKVbObM89LDLSRrKC7 5mEFCZIKIehKLdSoAc8yTzKRE0piyK/PR6ykzP9A2rEjrSN2HDqvsR+nDr0hQ1/V Uye4a8V0XiHyZsvD+AIYJnARGYdcnUCiezep81WaC55hn0sdqrhlqnycnkd+7fDE MZF9epXCnIAA8IF7I5jCLfwTa1b6ouMhBxpwiMpaL9DGmQghWcM= =GCGn -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests * tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cc: Fix enum spelling to fix kernel-doc warnings x86/boot: Drop unused sev_enable() fallback x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() x86/sev: Include XSS value in GHCB CPUID request x86/boot: Move boot_*msr helpers to asm/shared/msr.h	2025-12-02 13:07:53 -08:00
Linus Torvalds	6c26fbe8c9	Performance events changes for v6.19: Callchain support: - Add support for deferred user-space stack unwinding for perf, enabled on x86. (Peter Zijlstra, Steven Rostedt) - unwind_user/x86: Enable frame pointer unwinding on x86 (Josh Poimboeuf) x86 PMU support and infrastructure: - x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra) - x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra) Intel PMU driver: - Large series to prepare for and implement architectural PEBS support for Intel platforms such as Clearwater Forest (CWF) and Panther Lake (PTL). (Dapeng Mi, Kan Liang) - Check dynamic constraints (Kan Liang) - Optimize PEBS extended config (Peter Zijlstra) - cstates: Remove PC3 support from LunarLake (Zhang Rui) - cstates: Add Pantherlake support (Zhang Rui) - cstates: Clearwater Forest support (Zide Chen) AMD PMU driver: - x86/amd: Check event before enable to avoid GPF (George Kennedy) Fixes and cleanups: - task_work: Fix NMI race condition (Peter Zijlstra) - perf/x86: Fix NULL event access and potential PEBS record loss (Dapeng Mi) - Misc other fixes and cleanups. (Dapeng Mi, Ingo Molnar, Peter Zijlstra) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmktcU0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gNKw//ThLmbkoGJ0/yLOdEcW8rA/7HB43Oz6j9 k0Vs7zwDBMRFP4zQg2XeF5SH7CWS9p/nI3eMhorgmH77oJCvXJxVtD5991zmlZhf eafOar5ZMVaoMz+tK8WWiENZyuN0bt0mumZmz9svXR3KV1S/q18XZ8bCas0itwnq D0T3Gqi/Z39gJIy7bHNgLoFY2zvI9b2EJNDKlzHk3NJ7UamA4GuMHN0cM2dIzKGK 2L+wXOe2BH9YYzYrz/cdKq7sBMjOvFsCQ/5jh23A2Yu6JI4nJbw0WmexZRK1OWCp GAdMjBuqIShibLRxK746WRO9iut49uTsah4iSG80hXzhpwf7VaegOarost1nLaqm zweIOr3iwJRf273r6IqRuaporVHpQYMj2w2H63z36sQtGtkKHNyxZ50b6bqpwwjU LikLEJ9Bmh3mlvlXsOx2wX6dTb1fUk+cy2ezCDKUHqOLjqy4dM8V+jYhuRO4yxXz mj9aHZKgyuREt8yo/3nLqAzF5Okj9cXp7H6F1hCKWuCoAhNXkrvYcvbg8h6aRxOX 2vGhMYjpElkl/DG6OWCSwuqCt9nVEC/dazW9fKQjh4S0CFOVopaMGSkGcS/xUPub 92J4XMDEJX4RJ6dfspeQr97+1fETXEIWNv4WbKnDjqJlAucU1gnOTprVnAYUjcWw 74320FjGN1E= =/8GE -----END PGP SIGNATURE----- Merge tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Callchain support: - Add support for deferred user-space stack unwinding for perf, enabled on x86. (Peter Zijlstra, Steven Rostedt) - unwind_user/x86: Enable frame pointer unwinding on x86 (Josh Poimboeuf) x86 PMU support and infrastructure: - x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra) - x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra) Intel PMU driver: - Large series to prepare for and implement architectural PEBS support for Intel platforms such as Clearwater Forest (CWF) and Panther Lake (PTL). (Dapeng Mi, Kan Liang) - Check dynamic constraints (Kan Liang) - Optimize PEBS extended config (Peter Zijlstra) - cstates: - Remove PC3 support from LunarLake (Zhang Rui) - Add Pantherlake support (Zhang Rui) - Clearwater Forest support (Zide Chen) AMD PMU driver: - x86/amd: Check event before enable to avoid GPF (George Kennedy) Fixes and cleanups: - task_work: Fix NMI race condition (Peter Zijlstra) - perf/x86: Fix NULL event access and potential PEBS record loss (Dapeng Mi) - Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter Zijlstra)" * tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use perf/x86/intel: Optimize PEBS extended config perf/x86/intel: Check PEBS dyn_constraints perf/x86/intel: Add a check for dynamic constraints perf/x86/intel: Add counter group support for arch-PEBS perf/x86/intel: Setup PEBS data configuration and enable legacy groups perf/x86/intel: Update dyn_constraint base on PEBS event precise level perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR perf/x86/intel: Process arch-PEBS records or record fragments perf/x86/intel/ds: Factor out PEBS group processing code to functions perf/x86/intel/ds: Factor out PEBS record processing code to functions perf/x86/intel: Initialize architectural PEBS perf/x86/intel: Correct large PEBS flag check perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call perf/x86: Fix NULL event access and potential PEBS record loss perf/x86: Remove redundant is_x86_event() prototype entry,unwind/deferred: Fix unwind_reset_info() placement unwind_user/x86: Fix arch=um build perf: Support deferred user unwind unwind_user/x86: Teach FP unwind about start of function ...	2025-12-01 20:42:01 -08:00
Ard Biesheuvel	a3e6907128	x86/boot: Drop unused sev_enable() fallback The misc.h header is not included by the EFI stub, which is the only C caller of sev_enable(). This means the fallback for cases where CONFIG_AMD_MEM_ENCRYPT is not set is never used, so it can be dropped. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20250909080631.2867579-6-ardb+git@google.com	2025-11-20 21:12:48 +01:00
Usama Arif	eb22663125	x86/boot: Fix page table access in 5-level to 4-level paging transition When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags. Bits above the physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The PGD entry is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: `e9d0e6330e` ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com	2025-11-05 17:19:11 +01:00
Nathan Chancellor	5ff8ad3909	kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS This is a follow up to commit `c4781dc3d1` ("Kbuild: enable -fms-extensions") but in a separate change due to being substantially different from the initial submission. There are many places within the kernel that use their own CFLAGS instead of the main KBUILD_CFLAGS, meaning code written with the main kernel's use of '-fms-extensions' in mind that may be tangentially included in these areas will result in "error: declaration does not declare anything" messages from the compiler. Add '-fms-extensions' to all these areas to ensure consistency, along with -Wno-microsoft-anon-tag to silence clang's warning about use of the extension that the kernel cares about using. parisc does not build with clang so it does not need this warning flag. LoongArch does not need it either because -W flags from KBUILD_FLAGS are pulled into cflags-vdso. Reported-by: Christian Brauner <brauner@kernel.org> Closes: https://lore.kernel.org/20251030-meerjungfrau-getrocknet-7b46eacc215d@brauner/ Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2025-10-30 21:26:28 -04:00
John Allen	9249bcdea0	x86/boot: Move boot_msr helpers to asm/shared/msr.h The boot_{rdmsr,wrmsr}() helpers are just* the barebones MSR access functionality, without any tracing or exception handling glue as it is done in kernel proper. Move these helpers to asm/shared/msr.h and rename to raw_{rdmsr,wrmsr}() to indicate what they are. [ bp: Correct the reason why those helpers exist. I should've caught that in the original patch that added them: `176db62257` ("x86/boot: Introduce helpers for MSR reads/writes" but oh well... - fixup include path delimiters to <> ] Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/all/20250924200852.4452-2-john.allen@amd.com	2025-10-30 16:29:53 +01:00
Peter Zijlstra	45e1dccc06	x86/insn: Simplify for_each_insn_prefix() Use the new-found freedom of allowing variable declarions inside for() to simplify the for_each_insn_prefix() iterator to no longer need an external temporary. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2025-10-16 11:13:48 +02:00
Ingo Molnar	0ca77f8d33	Merge branch 'x86/apic' into x86/sev, to resolve conflict Conflicts: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-09-05 09:01:42 +02:00
Ard Biesheuvel	c5c30a3736	x86/boot: Move startup code out of __head section Move startup code out of the __head section, now that this no longer has a special significance. Move everything into .text or .init.text as appropriate, so that startup code is not kept around unnecessarily. [ bp: Fold in hunk to fix 32-bit CPU hotplug: Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509022207.56fd97f4-lkp@intel.com ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-45-ardb+git@google.com	2025-09-03 18:06:04 +02:00
Ard Biesheuvel	e7b88bc005	efistub/x86: Remap inittext read-execute when needed Recent EFI x86 systems are more strict when it comes to mapping boot images, and require that mappings are either read-write or read-execute. Now that the boot code is being cleaned up and refactored, most of it is being moved into .init.text [where it arguably belongs] but that implies that when booting on such strict EFI firmware, we need to take care to map .init.text (and the .altinstr_aux section that follows it) read-execute as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-44-ardb+git@google.com	2025-09-03 18:05:42 +02:00
Ard Biesheuvel	9723dd0c70	x86/sev: Provide PIC aliases for SEV related data objects Provide PIC aliases for data objects that are shared between the SEV startup code and the SEV code that executes later. This is needed so that the confined startup code is permitted to access them. This requires some of these variables to be moved into a source file that is not part of the startup code, as the PIC alias is already implied, and exporting variables in the opposite direction is not supported. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-36-ardb+git@google.com	2025-09-03 17:59:43 +02:00
Ard Biesheuvel	68a501d7fd	x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check snp_vmpl will be assigned a non-zero value when executing at a VMPL other than 0, and this is inferred from a call to RMPADJUST, which only works when running at VMPL0. This means that testing snp_vmpl is sufficient, and there is no need to perform the same check again. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-34-ardb+git@google.com	2025-09-03 17:59:09 +02:00
Ard Biesheuvel	c54604fb7f	x86/sev: Use boot SVSM CA for all startup and init code To avoid having to reason about whether or not to use the per-CPU SVSM calling area when running startup and init code on the boot CPU, reuse the boot SVSM calling area as the per-CPU area for the BSP. Thus, remove the need to make the per-CPU variables and associated state in sev_cfg accessible to the startup code once confined. [ bp: Massage commit message. ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-33-ardb+git@google.com	2025-09-03 17:58:26 +02:00
Ard Biesheuvel	00d2556676	x86/sev: Pass SVSM calling area down to early page state change API The early page state change API is mostly only used very early, when only the boot time SVSM calling area is in use. However, this API is also called by the kexec finishing code, which runs very late, and potentially from a different CPU (which uses a different calling area). To avoid pulling the per-CPU SVSM calling area pointers and related SEV state into the startup code, refactor the page state change API so the SVSM calling area virtual and physical addresses can be provided by the caller. No functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-32-ardb+git@google.com	2025-09-03 17:58:22 +02:00
Ard Biesheuvel	d5949ea50c	x86/sev: Share implementation of MSR-based page state change Both the decompressor and the SEV startup code implement the exact same sequence for invoking the MSR based communication protocol to effectuate a page state change. Before tweaking the internal APIs used in both versions, merge them and share them so those tweaks are only needed in a single place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-31-ardb+git@google.com	2025-09-03 17:58:19 +02:00
Ard Biesheuvel	a5f03880f0	x86/sev: Avoid global variable to store virtual address of SVSM area The boottime SVSM calling area is used both by the startup code running from a 1:1 mapping, and potentially later on running from the ordinary kernel mapping. This SVSM calling area is statically allocated, and so its physical address doesn't change. However, its virtual address depends on the calling context (1:1 mapping or kernel virtual mapping), and even though the variable that holds the virtual address of this calling area gets updated from 1:1 address to kernel address during the boot, it is hard to reason about why this is guaranteed to be safe. So instead, take the RIP-relative address of the boottime SVSM calling area whenever its virtual address is required, and only use a global variable for the physical address. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-30-ardb+git@google.com	2025-09-03 17:58:15 +02:00
Ard Biesheuvel	37dbd78f98	x86/sev: Move GHCB page based HV communication out of startup code Both the decompressor and the core kernel implement an early #VC handler, which only deals with CPUID instructions, and full featured one, which can handle any #VC exception. The former communicates with the hypervisor using the MSR based protocol, whereas the latter uses a shared GHCB page, which is configured a bit later during the boot, when the kernel runs from its ordinary virtual mapping, rather than the 1:1 mapping that the startup code uses. Accessing this shared GHCB page from the core kernel's startup code is problematic, because it involves converting the GHCB address provided by the caller to a physical address. In the startup code, virtual to physical address translations are problematic, given that the virtual address might be a 1:1 mapped address, and such translations should therefore be avoided. This means that exposing startup code dealing with the GHCB to callers that execute from the ordinary kernel virtual mapping should be avoided too. So move all GHCB page based communication out of the startup code, now that all communication occurring before the kernel virtual mapping is up relies on the MSR protocol only. As an exception, add a flag representing the need to apply the coherency fix in order to avoid exporting CPUID* helpers because of the code running too early for the cpu_has infrastructure. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-29-ardb+git@google.com	2025-09-03 17:55:25 +02:00
Neeraj Upadhyay	27a17e0241	x86/sev: Indicate the SEV-SNP guest supports Secure AVIC Now that Secure AVIC support is complete, make it part of to the SNP present features. Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828113225.209174-1-Neeraj.Upadhyay@amd.com	2025-09-01 13:20:53 +02:00
Ard Biesheuvel	e349241b97	x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Determining the VMPL at which the kernel runs involves performing a RMPADJUST operation on an arbitrary page of memory, and observing whether it succeeds. The use of boot_ghcb_page in the core kernel in this case is completely arbitrary, but results in the need to provide a PIC alias for it. So use boot_svsm_ca_page instead, which already needs this alias for other reasons. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-28-ardb+git@google.com	2025-08-31 12:40:56 +02:00
Ard Biesheuvel	7cb7b6de9c	x86/sev: Use MSR protocol only for early SVSM PVALIDATE call The early page state change API performs an SVSM call to PVALIDATE each page when running under a SVSM, and this involves either a GHCB page based call or a call based on the MSR protocol. The GHCB page based variant involves VA to PA translation of the GHCB address, and this is best avoided in the startup code, where virtual addresses are ambiguous (1:1 or kernel virtual). As this is the last remaining occurrence of svsm_perform_call_protocol() in the startup code, switch to the MSR protocol exclusively in this particular case, so that the GHCB based plumbing can be moved out of the startup code entirely in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-27-ardb+git@google.com	2025-08-31 12:40:55 +02:00
Neeraj Upadhyay	30c2b98aa8	x86/apic: Add new driver for Secure AVIC The Secure AVIC feature provides SEV-SNP guests hardware acceleration for performance sensitive APIC accesses while securely managing the guest-owned APIC state through the use of a private APIC backing page. This helps prevent the hypervisor from generating unexpected interrupts for a vCPU or otherwise violate architectural assumptions around the APIC behavior. Add a new x2APIC driver that will serve as the base of the Secure AVIC support. It is initially the same as the x2APIC physical driver (without IPI callbacks), but will be modified as features are implemented. As the new driver does not implement Secure AVIC features yet, if the hypervisor sets the Secure AVIC bit in SEV_STATUS, maintain the existing behavior to enforce the guest termination. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828070334.208401-2-Neeraj.Upadhyay@amd.com	2025-08-28 17:57:19 +02:00
Vitaly Kuznetsov	61b57d3539	x86/efi: Implement support for embedding SBAT data for x86 Similar to zboot architectures, implement support for embedding SBAT data for x86. Put '.sbat' section in between '.data' and '.text' as the former also covers '.bss' and '.pgtable' and thus must be the last one in the file. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/20250603091951.57775-1-vkuznets@redhat.com	2025-06-21 13:53:44 +02:00
Linus Torvalds	00c010e130	- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - The 3 patch series "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - The 2 patch series "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - The 8 patch series "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - The 2 patch series "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - The 12 patch series "Always call constructor for kernel page tables" from Kevin Brodsky "ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables". This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - The 9 patch series "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - The 3 patch series "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - The 6 patch series "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - The 3 patch series ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - The 2 patch series "vmscan: enforce mems_effective during demotion" from Gregory Price "changes reclaim to respect cpuset.mems_effective during demotion when possible". because "presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated." "This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently." - The 2 patch series ""Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - The 4 patch series "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - The 2 patch series "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - The 4 patch series "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - The 6 patch series "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - The 4 patch series "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - The 6 patch series "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is "yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents." - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - The 4 patch series "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x Qqb/UGMEpilyre1PayQqOnct3aSL9Ao= =tYYm -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...	2025-05-31 15:44:16 -07:00
Kirill A. Shutemov	7212b58d6d	x86/mm/64: Make 5-level paging support unconditional Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. All major x86 Linux distributions have the feature enabled. Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com	2025-05-17 10:38:16 +02:00
Ahmed S. Darwish	968e300068	x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de	2025-05-15 18:23:55 +02:00
Alexander Graf	a8ebb70447	x86/boot: make sure KASLR does not step over KHO preserved memory During kexec handover (KHO) memory contains data that should be preserved and this data would be consumed by kexec'ed kernel. To make sure that the preserved memory is not overwritten, KHO uses "scratch regions" to bootstrap kexec'ed kernel. These regions are guaranteed to not have any memory that KHO would preserve and are used as the only memory the kernel sees during the early boot. The scratch regions are passed in the setup_data by the first kernel with other KHO parameters. If the setup_data contains the KHO parameters, limit randomization to scratch areas only to make sure preserved memory won't get overwritten. Since all the pointers in setup_data are represented by u64, they require double casting (first to unsigned long and then to the actual pointer type) to compile on 32-bits. This looks goofy out of context, but it is unfortunately the way that this is handled across the tree. There are at least a dozen instances of casting like this. Link: https://lkml.kernel.org/r/20250509074635.3187114-14-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-12 23:50:41 -07:00
Ard Biesheuvel	ed4d95d033	x86/sev: Disentangle #VC handling code from startup code Most of the SEV support code used to reside in a single C source file that was included in two places: the core kernel, and the decompressor. The code that is actually shared with the decompressor was moved into a separate, shared source file under startup/, on the basis that the decompressor also executes from the early 1:1 mapping of memory. However, while the elaborate #VC handling and instruction decoding that it involves is also performed by the decompressor, it does not actually occur in the core kernel at early boot, and therefore, does not need to be part of the confined early startup code. So split off the #VC handling code and move it back into arch/x86/coco where it came from, into another C source file that is included from both the decompressor and the core kernel. Code movement only - no functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-31-ardb+git@google.com	2025-05-05 07:07:29 +02:00
Ard Biesheuvel	ae862964cb	x86/sev: Move instruction decoder into separate source file As a first step towards disentangling the SEV #VC handling code -which is shared between the decompressor and the core kernel- from the SEV startup code, move the decompressor's copy of the instruction decoder into a separate source file. Code movement only - no functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-30-ardb+git@google.com	2025-05-04 15:53:06 +02:00
Ard Biesheuvel	fae89bbfdd	x86/sev: Make sev_snp_enabled() a static function sev_snp_enabled() is no longer used outside of the source file that defines it, so make it static and drop the extern declarations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-29-ardb+git@google.com	2025-05-04 15:53:06 +02:00
Ingo Molnar	39ffd86dd7	Merge branch 'x86/urgent' into x86/boot, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-04 12:09:02 +02:00
Ard Biesheuvel	8ed12ab131	x86/boot/sev: Support memory acceptance in the EFI stub under SVSM Commit: `d54d610243` ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: `fcd042e864` ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com	2025-05-04 08:20:27 +02:00
Ard Biesheuvel	a3cbbb4717	x86/boot: Move SEV startup code into startup/ Move the SEV startup code into arch/x86/boot/startup/, where it will reside along with other code that executes extremely early, and therefore needs to be built in a special manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com	2025-04-22 09:12:01 +02:00
Ard Biesheuvel	234cf67fc3	x86/sev: Split off startup code from core code Disentangle the SEV core code and the SEV code that is called during early boot. The latter piece will be moved into startup/ in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com	2025-04-22 09:12:01 +02:00
Ingo Molnar	a1b582a3ff	Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and upstream fixes In particular we need this fix before applying subsequent changes: `d54d610243` ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-22 09:09:21 +02:00
Ard Biesheuvel	d54d610243	x86/boot/sev: Avoid shared GHCB page for early memory acceptance Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: `6c32117963` ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission	2025-04-18 14:30:30 +02:00
Uros Bizjak	0dcc51477b	x86/boot: Remove semicolon from "rep" prefixes Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Mares <mj@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250418071437.4144391-1-ubizjak@gmail.com	2025-04-18 09:32:57 +02:00
Ard Biesheuvel	221df25fdf	x86/sev: Prepare for splitting off early SEV code Prepare for splitting off parts of the SEV core.c source file into a file that carries code that must tolerate being called from the early 1:1 mapping. This will allow special build-time handling of thise code, to ensure that it gets generated in a way that is compatible with the early execution context. So create a de-facto internal SEV API and put the definitions into sev-internal.h. No attempt is made to allow this header file to be included in arbitrary other sources - this is explicitly not the intent. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com	2025-04-12 11:13:05 +02:00
Ard Biesheuvel	4cecebf200	x86/boot: Move the early GDT/IDT setup code into startup/ Move the early GDT/IDT setup code that runs long before the kernel virtual mapping is up into arch/x86/boot/startup/, and build it in a way that ensures that the code tolerates being called from the 1:1 mapping of memory. The code itself is left unchanged by this patch. Also tweak the sed symbol matching pattern in the decompressor to match on lower case 't' or 'b', as these will be emitted by Clang for symbols with hidden linkage. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-15-ardb+git@google.com	2025-04-12 11:13:04 +02:00
Ard Biesheuvel	5a67da1f49	x86/boot: Move the 5-level paging trampoline into /startup The 5-level paging trampoline is used by both the EFI stub and the traditional decompressor. Move it out of the decompressor sources into the newly minted arch/x86/boot/startup/ sub-directory which will hold startup code that may be shared between the decompressor, the EFI stub and the kernel proper, and needs to tolerate being called during early boot, before the kernel virtual mapping has been created. This will allow the 5-level paging trampoline to be used by EFI boot images such as zboot that omit the traditional decompressor entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401133416.1436741-10-ardb+git@google.com	2025-04-06 20:15:14 +02:00

1 2 3 4 5 ...

986 Commits