From ad9c29f3c29197aa25d26a5f258a98e4cb901996 Mon Sep 17 00:00:00 2001 From: "Chang S. Bae" Date: Fri, 20 Jan 2023 16:18:57 -0800 Subject: [PATCH 1/4] Documentation/x86: Explain the purpose for dynamic features This summary will help to guide the proper use of the enabling model. Signed-off-by: Chang S. Bae Signed-off-by: Dave Hansen Reviewed-by: Tony Luck Link: https://lore.kernel.org/all/20230121001900.14900-2-chang.seok.bae%40intel.com --- Documentation/x86/xstate.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst index 5cec7fb558d6..e954e79af4ce 100644 --- a/Documentation/x86/xstate.rst +++ b/Documentation/x86/xstate.rst @@ -11,6 +11,22 @@ are enabled by XCR0 as well, but the first use of related instruction is trapped by the kernel because by default the required large XSTATE buffers are not allocated automatically. +The purpose for dynamic features +-------------------------------- + +Legacy userspace libraries often have hard-coded, static sizes for +alternate signal stacks, often using MINSIGSTKSZ which is typically 2KB. +That stack must be able to store at *least* the signal frame that the +kernel sets up before jumping into the signal handler. That signal frame +must include an XSAVE buffer defined by the CPU. + +However, that means that the size of signal stacks is dynamic, not static, +because different CPUs have differently-sized XSAVE buffers. A compiled-in +size of 2KB with existing applications is too small for new CPU features +like AMX. Instead of universally requiring larger stack, with the dynamic +enabling, the kernel can enforce userspace applications to have +properly-sized altstacks. + Using dynamically enabled XSTATE features in user space applications -------------------------------------------------------------------- From a03c376ebaf38394a63a75292329f38a47520c2c Mon Sep 17 00:00:00 2001 From: "Chang S. Bae" Date: Fri, 20 Jan 2023 16:18:58 -0800 Subject: [PATCH 2/4] x86/arch_prctl: Add AMX feature numbers as ABI constants Each distinct XSAVE feature has a number assigned to it. Among other things, the number determines the ordering of features in the XSAVE buffer and is also used to generate XSAVE bitmasks like the value for XCR0. AMX state is dynamically enabled by the architecture-specific prctl(). This prctl() takes one XSAVE feature number as an argument. However, the feature numbers are not defined in any readily available userspace headers. The means that each userspace app trying to use dynamic feature prctl()s will likely end up defining their own constants for each feature. Since these feature numbers are a part of the uabi, expose them in the prctl() uabi header. Save everyone the trouble of looking them up and defining their own. [ dhansen: expand changelog a bit ] Signed-off-by: Chang S. Bae Signed-off-by: Dave Hansen Reviewed-by: Tony Luck Link: https://lore.kernel.org/all/20230121001900.14900-3-chang.seok.bae%40intel.com --- arch/x86/include/uapi/asm/prctl.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..f298c778f856 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -16,6 +16,9 @@ #define ARCH_GET_XCOMP_GUEST_PERM 0x1024 #define ARCH_REQ_XCOMP_GUEST_PERM 0x1025 +#define ARCH_XCOMP_TILECFG 17 +#define ARCH_XCOMP_TILEDATA 18 + #define ARCH_MAP_VDSO_X32 0x2001 #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 From 7f9daaf59e14d62b29b6f4ca743e17bf96ff42ae Mon Sep 17 00:00:00 2001 From: "Chang S. Bae" Date: Fri, 20 Jan 2023 16:18:59 -0800 Subject: [PATCH 3/4] Documentation/x86: Add the AMX enabling example Explain steps to enable the dynamic feature with a code example. Signed-off-by: Chang S. Bae Signed-off-by: Dave Hansen Reviewed-by: Thiago Macieira Reviewed-by: Bagas Sanjaya Reviewed-by: Tony Luck Link: https://lore.kernel.org/all/20230121001900.14900-4-chang.seok.bae%40intel.com --- Documentation/x86/xstate.rst | 55 ++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst index e954e79af4ce..23b1c9f3efb2 100644 --- a/Documentation/x86/xstate.rst +++ b/Documentation/x86/xstate.rst @@ -80,6 +80,61 @@ the handler allocates a larger xstate buffer for the task so the large state can be context switched. In the unlikely cases that the allocation fails, the kernel sends SIGSEGV. +AMX TILE_DATA enabling example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Below is the example of how userspace applications enable +TILE_DATA dynamically: + + 1. The application first needs to query the kernel for AMX + support:: + + #include + #include + #include + #include + + #ifndef ARCH_GET_XCOMP_SUPP + #define ARCH_GET_XCOMP_SUPP 0x1021 + #endif + + #ifndef ARCH_XCOMP_TILECFG + #define ARCH_XCOMP_TILECFG 17 + #endif + + #ifndef ARCH_XCOMP_TILEDATA + #define ARCH_XCOMP_TILEDATA 18 + #endif + + #define MASK_XCOMP_TILE ((1 << ARCH_XCOMP_TILECFG) | \ + (1 << ARCH_XCOMP_TILEDATA)) + + unsigned long features; + long rc; + + ... + + rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features); + + if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE) + printf("AMX is available.\n"); + + 2. After that, determining support for AMX, an application must + explicitly ask permission to use it:: + + #ifndef ARCH_REQ_XCOMP_PERM + #define ARCH_REQ_XCOMP_PERM 0x1023 + #endif + + ... + + rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA); + + if (!rc) + printf("AMX is ready for use.\n"); + +Note this example does not include the sigaltstack preparation. + Dynamic features in signal frames --------------------------------- From 5fbff260755750559aa12a30f6fa7f8a863666f1 Mon Sep 17 00:00:00 2001 From: "Chang S. Bae" Date: Fri, 20 Jan 2023 16:19:00 -0800 Subject: [PATCH 4/4] Documentation/x86: Explain the state component permission for guests Commit 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") extends a couple of arch_prctl(2) options for VCPU threads. Add description for them. Signed-off-by: Chang S. Bae Signed-off-by: Dave Hansen Reviewed-by: Thiago Macieira Reviewed-by: Yang Zhong Reviewed-by: Tony Luck Link: https://lore.kernel.org/all/20230121001900.14900-5-chang.seok.bae%40intel.com --- Documentation/x86/xstate.rst | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst index 23b1c9f3efb2..ae5c69e48b11 100644 --- a/Documentation/x86/xstate.rst +++ b/Documentation/x86/xstate.rst @@ -143,3 +143,32 @@ entry if the feature is in its initial configuration. This differs from non-dynamic features which are always written regardless of their configuration. Signal handlers can examine the XSAVE buffer's XSTATE_BV field to determine if a features was written. + +Dynamic features for virtual machines +------------------------------------- + +The permission for the guest state component needs to be managed separately +from the host, as they are exclusive to each other. A coupled of options +are extended to control the guest permission: + +-ARCH_GET_XCOMP_GUEST_PERM + + arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features); + + ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So it + provides the same semantics and functionality but for the guest + components. + +-ARCH_REQ_XCOMP_GUEST_PERM + + arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr); + + ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has the + same semantics for the guest permission. While providing a similar + functionality, this comes with a constraint. Permission is frozen when the + first VCPU is created. Any attempt to change permission after that point + is going to be rejected. So, the permission has to be requested before the + first VCPU creation. + +Note that some VMMs may have already established a set of supported state +components. These options are not presumed to support any particular VMM.