Commit Graph

432 Commits

Author SHA1 Message Date
Linus Torvalds
370c388319 Crypto library updates for 7.1
- Migrate more hash algorithms from the traditional crypto subsystem
   to lib/crypto/.
 
   Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
   the implementations, improves performance, enables further
   simplifications in calling code, and solves various other issues:
 
     - AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
 
         - Support these algorithms in lib/crypto/ using the AES
           library and the existing arm64 assembly code
 
         - Reimplement the traditional crypto API's "cmac(aes)",
           "xcbc(aes)", and "cbcmac(aes)" on top of the library
 
         - Convert mac80211 to use the AES-CMAC library. Note: several
           other subsystems can use it too and will be converted later
 
         - Drop the broken, nonstandard, and likely unused support for
           "xcbc(aes)" with key lengths other than 128 bits
 
         - Enable optimizations by default
 
     - GHASH
 
         - Migrate the standalone GHASH code into lib/crypto/
 
         - Integrate the GHASH code more closely with the very similar
           POLYVAL code, and improve the generic GHASH implementation
           to resist cache-timing attacks and use much less memory
 
         - Reimplement the AES-GCM library and the "gcm" crypto_aead
           template on top of the GHASH library. Remove "ghash" from
           the crypto_shash API, as it's no longer needed
 
         - Enable optimizations by default
 
     - SM3
 
         - Migrate the kernel's existing SM3 code into lib/crypto/, and
           reimplement the traditional crypto API's "sm3" on top of it
 
         - I don't recommend using SM3, but this cleanup is worthwhile
           to organize the code the same way as other algorithms
 
 - Testing improvements
 
     - Add a KUnit test suite for each of the new library APIs
 
     - Migrate the existing ChaCha20Poly1305 test to KUnit
 
     - Make the KUnit all_tests.config enable all crypto library tests
 
     - Move the test kconfig options to the Runtime Testing menu
 
 - Other updates to arch-optimized crypto code
 
     - Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
 
     - Remove some MD5 implementations that are no longer worth keeping
 
     - Drop big endian and voluntary preemption support from the arm64
       code, as those configurations are no longer supported on arm64
 
 - Make jitterentropy and samples/tsm-mr use the crypto library APIs
 
 Note: the overall diffstat is neutral, but when the test code is
 excluded it is significantly negative:
 
     Tests:     13 files changed, 1982 insertions(+),  888 deletions(-)
     Non-test: 141 files changed, 2897 insertions(+), 3987 deletions(-)
     All:      154 files changed, 4879 insertions(+), 4875 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadWPyxQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK8QCAQD0i98miI1mu01RKuEwrBzmn7L/2sUH
 ReYV/dFDtnN0GwD+KMCiNAM2XTVLRKq5t3OxPHpKZ4y+gZwRowAJeFA02Q8=
 =5rip
 -----END PGP SIGNATURE-----

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library updates from Eric Biggers:

 - Migrate more hash algorithms from the traditional crypto subsystem to
   lib/crypto/

   Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
   the implementations, improves performance, enables further
   simplifications in calling code, and solves various other issues:

     - AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)

         - Support these algorithms in lib/crypto/ using the AES library
           and the existing arm64 assembly code

         - Reimplement the traditional crypto API's "cmac(aes)",
           "xcbc(aes)", and "cbcmac(aes)" on top of the library

         - Convert mac80211 to use the AES-CMAC library. Note: several
           other subsystems can use it too and will be converted later

         - Drop the broken, nonstandard, and likely unused support for
           "xcbc(aes)" with key lengths other than 128 bits

         - Enable optimizations by default

     - GHASH

         - Migrate the standalone GHASH code into lib/crypto/

         - Integrate the GHASH code more closely with the very similar
           POLYVAL code, and improve the generic GHASH implementation to
           resist cache-timing attacks and use much less memory

         - Reimplement the AES-GCM library and the "gcm" crypto_aead
           template on top of the GHASH library. Remove "ghash" from the
           crypto_shash API, as it's no longer needed

         - Enable optimizations by default

     - SM3

         - Migrate the kernel's existing SM3 code into lib/crypto/, and
           reimplement the traditional crypto API's "sm3" on top of it

         - I don't recommend using SM3, but this cleanup is worthwhile
           to organize the code the same way as other algorithms

 - Testing improvements:

     - Add a KUnit test suite for each of the new library APIs

     - Migrate the existing ChaCha20Poly1305 test to KUnit

     - Make the KUnit all_tests.config enable all crypto library tests

     - Move the test kconfig options to the Runtime Testing menu

 - Other updates to arch-optimized crypto code:

     - Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine

     - Remove some MD5 implementations that are no longer worth keeping

     - Drop big endian and voluntary preemption support from the arm64
       code, as those configurations are no longer supported on arm64

 - Make jitterentropy and samples/tsm-mr use the crypto library APIs

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
  lib/crypto: arm64: Assume a little-endian kernel
  arm64: fpsimd: Remove obsolete cond_yield macro
  lib/crypto: arm64/sha3: Remove obsolete chunking logic
  lib/crypto: arm64/sha512: Remove obsolete chunking logic
  lib/crypto: arm64/sha256: Remove obsolete chunking logic
  lib/crypto: arm64/sha1: Remove obsolete chunking logic
  lib/crypto: arm64/poly1305: Remove obsolete chunking logic
  lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
  lib/crypto: arm64/chacha: Remove obsolete chunking logic
  lib/crypto: arm64/aes: Remove obsolete chunking logic
  lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
  lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
  lib/crypto: aescfb: Don't disable IRQs during AES block encryption
  lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
  lib/crypto: sparc: Drop optimized MD5 code
  lib/crypto: mips: Drop optimized MD5 code
  lib: Move crypto library tests to Runtime Testing menu
  crypto: sm3 - Remove 'struct sm3_state'
  crypto: sm3 - Remove the original "sm3_block_generic()"
  crypto: sm3 - Remove sm3_base.h
  ...
2026-04-13 17:31:39 -07:00
Eric Biggers
11d6bc70ff lib/crypto: arm64/aes: Remove obsolete chunking logic
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64.  And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either.  Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the AES-CBC-MAC code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01 13:02:09 -07:00
Eric Biggers
9f69f52b46 lib/crypto: arm64/sm3: Migrate optimized code into library
Instead of exposing the arm64-optimized SM3 code via arm64-specific
crypto_shash algorithms, instead just implement the sm3_blocks() library
function.  This is much simpler, it makes the SM3 library functions be
arm64-optimized, and it fixes the longstanding issue where the
arm64-optimized SM3 code was disabled by default.  SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Tweak the SM3 assembly function prototypes to match what the library
expects, including changing the block count from 'int' to 'size_t'.
sm3_ce_transform() had to be updated to access 'x2' instead of 'w2',
while sm3_neon_transform() already used 'x2'.

Remove the CFI stubs which are no longer needed because the SM3 assembly
functions are no longer ever indirectly called.

Remove the dependency on KERNEL_MODE_NEON.  It was unnecessary, because
KERNEL_MODE_NEON is always enabled on arm64.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 17:50:59 -07:00
Eric Biggers
631a84e49e crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized
Rename the 'struct ghash_key' in arch/arm64/crypto/ghash-ce-glue.c to
prevent a naming conflict with the library 'struct ghash_key'.  In
addition, declare the 'h' field with an explicit size, now that there's
no longer any reason for it to be a flexible array.

Update the comments in the assembly file to match the C code.  Note that
some of these were out-of-date.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 16:44:29 -07:00
Eric Biggers
a336c01f5b lib/crypto: arm64/ghash: Migrate optimized code into library
Remove the "ghash-neon" crypto_shash algorithm.  Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.

This makes the GHASH library be optimized on arm64 (though only with
NEON, not PMULL; for now the goal is just parity with crypto_shash).  It
greatly reduces the amount of arm64-specific glue code that is needed,
and it fixes the issue where this optimization was disabled by default.

To integrate the assembly code correctly with the library, make the
following tweaks:

- Change the type of 'blocks' from int to size_t
- Change the types of 'dg' and 'h' to polyval_elem.  Note that this
  simply reflects the format that the code was already using.
- Remove the 'head' argument, which is no longer needed.
- Remove the CFI stubs, as indirect calls are no longer used.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 16:44:29 -07:00
Eric Biggers
e3f473db02 crypto: arm64/ghash - Move NEON GHASH assembly into its own file
arch/arm64/crypto/ghash-ce-core.S implements pmull_ghash_update_p8(),
which is used only by a crypto_shash implementation of GHASH.  It also
implements other functions, including pmull_ghash_update_p64() and
others, which are used only by a crypto_aead implementation of AES-GCM.

While some code is shared between pmull_ghash_update_p8() and
pmull_ghash_update_p64(), it's not very much.  Since
pmull_ghash_update_p8() will also need to be migrated into lib/crypto/
to achieve parity in the standalone GHASH support, let's move it into a
separate file ghash-neon-core.S.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 15:24:59 -07:00
Cheng-Yang Chou
652a3017c4 crypto: arm64/aes-neonbs - Move key expansion off the stack
aesbs_setkey() and aesbs_cbc_ctr_setkey() allocate struct crypto_aes_ctx
on the stack. On arm64, the kernel-mode NEON context is also stored on
the stack, causing the combined frame size to exceed 1024 bytes and
triggering -Wframe-larger-than= warnings.

Allocate struct crypto_aes_ctx on the heap instead and use
kfree_sensitive() to ensure the key material is zeroed on free.
Use a goto-based cleanup path to ensure kfree_sensitive() is always
called.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Fixes: 4fa617cc68 ("arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack")
Link: https://lore.kernel.org/r/20260306064254.2079274-1-yphbchou0911@gmail.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09 13:46:11 -07:00
Eric Biggers
58286738b1 lib/crypto: arm64/aes: Migrate optimized CBC-based MACs into library
Instead of exposing the arm64-optimized CMAC, XCBC-MAC, and CBC-MAC code
via arm64-specific crypto_shash algorithms, instead just implement the
aes_cbcmac_blocks_arch() library function.  This is much simpler, it
makes the corresponding library functions be arm64-optimized, and it
fixes the longstanding issue where this optimized code was disabled by
default.  The corresponding algorithms still remain available through
crypto_shash, but individual architectures no longer need to handle it.

Note that to be compatible with the library using 'size_t' lengths, the
type of the return value and 'blocks' parameter to the assembly
functions had to be changed to 'size_t', and the assembly code had to be
updated accordingly to use the corresponding 64-bit registers.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09 13:27:20 -07:00
Eric Biggers
4b90840320 lib/crypto: arm64/aes: Move assembly code for AES modes into libaes
To migrate the support for CBC-based MACs into libaes, the corresponding
arm64 assembly code needs to be moved there.  However, the arm64 AES
assembly code groups many AES modes together; individual modes aren't
easily separable.  (This isn't unique to arm64; other architectures
organize their AES modes similarly.)

Since the other AES modes will be migrated into the library eventually
too, just move the full assembly files for the AES modes into the
library.  (This is similar to what I already did for PowerPC and SPARC.)

Specifically: move the assembly files aes-ce.S, aes-modes.S, and
aes-neon.S and their build rules; declare the assembly functions in
<crypto/aes.h>; and export the assembly functions from libaes.

Note that the exports and public declarations of the assembly functions
are temporary.  They exist only to keep arch/arm64/crypto/ working until
the AES modes are fully moved into the library.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09 13:27:20 -07:00
Eric Biggers
f8f08d7cc4 crypto: arm64/aes - Fix 32-bit aes_mac_update() arg treated as 64-bit
Since the 'enc_after' argument to neon_aes_mac_update() and
ce_aes_mac_update() has type 'int', it needs to be accessed using the
corresponding 32-bit register, not the 64-bit register.  The upper half
of the corresponding 64-bit register may contain garbage.

Fixes: 4860620da7 ("crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driver")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09 13:27:20 -07:00
Eric Biggers
370960c153 crypto: arm64/ghash - Use new AES library API
Switch from the old AES library functions (which use struct
crypto_aes_ctx) to the new ones (which use struct aes_enckey).  This
eliminates the unnecessary computation and caching of the decryption
round keys.  The new AES en/decryption functions are also much faster
and use AES instructions when supported by the CPU.

Note that in addition to the change in the key preparation function and
the key struct type itself, the change in the type of the key struct
results in aes_encrypt() (which is temporarily a type-generic macro)
calling the new encryption function rather than the old one.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-25-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15 14:09:08 -08:00
Eric Biggers
2b1ef7aeeb lib/crypto: arm64/aes: Migrate optimized code into library
Move the ARM64 optimized AES key expansion and single-block AES
en/decryption code into lib/crypto/, wire it up to the AES library API,
and remove the superseded crypto_cipher algorithms.

The result is that both the AES library and crypto_cipher APIs are now
optimized for ARM64, whereas previously only crypto_cipher was (and the
optimizations weren't enabled by default, which this fixes as well).

Note: to see the diff from arch/arm64/crypto/aes-ce-glue.c to
lib/crypto/arm64/aes.h, view this commit with 'git show -M10'.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
a248447427 crypto: aes - Replace aes-generic with wrapper around lib
Now that the AES library's performance has been improved, replace
aes_generic.c with a new file aes.c which wraps the AES library.

In preparation for making the AES library actually utilize the kernel's
existing architecture-optimized AES code including AES instructions, set
the driver name to "aes-lib" instead of "aes-generic".  This mirrors
what's been done for the hash algorithms.  Update testmgr.c accordingly.

Since this removes the crypto_aes_set_key() helper function, add
temporary replacements for it to arch/arm/crypto/aes-cipher-glue.c and
arch/arm64/crypto/aes-cipher-glue.c.  This is temporary, as that code
will be migrated into lib/crypto/ in later commits.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
5be8dcc1d0 crypto: arm64/aes - Select CRYPTO_LIB_SHA256 from correct places
The call to sha256() occurs in code that is built when either
CRYPTO_AES_ARM64_CE_BLK or CRYPTO_AES_ARM64_NEON_BLK.  The option
CRYPTO_AES_ARM64 is unrelated, notwithstanding its documentation.  I'll
be removing CRYPTO_AES_ARM64 soon anyway, but before doing that, fix
where CRYPTO_LIB_SHA256 is selected from.

Fixes: 01834444d9 ("crypto: arm64/aes - use SHA-256 library instead of crypto_shash")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
1cb2fcb61c crypto: arm64/aes - Switch to aes_enc_tab[] and aes_dec_tab[]
Instead of crypto_ft_tab and crypto_it_tab from aes_generic.c, use
aes_enc_tab and aes_dec_tab from lib/crypto/aes.c.  These contain the
same data in the first 1024 bytes (which is the part that this code
uses), so the result is the same.  This will allow aes_generic.c to
eventually be removed.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
b4a8528d17 lib/crypto: arm64/nh: Migrate optimized code into library
Migrate the arm64 NEON implementation of NH into lib/crypto/.  This
makes the nh() function be optimized on arm64 kernels.

Note: this temporarily makes the adiantum template not utilize the arm64
optimized NH code.  This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Eric Biggers
f6a458746f crypto: arm64/ghash - Fix incorrect output from ghash-neon
Commit 9a7c987fb9 ("crypto: arm64/ghash - Use API partial block
handling") made ghash_finup() pass the wrong buffer to
ghash_do_simd_update().  As a result, ghash-neon now produces incorrect
outputs when the message length isn't divisible by 16 bytes.  Fix this.

(I didn't notice this earlier because this code is reached only on CPUs
that support NEON but not PMULL.  I haven't yet found a way to get
qemu-system-aarch64 to emulate that configuration.)

Fixes: 9a7c987fb9 ("crypto: arm64/ghash - Use API partial block handling")
Cc: stable@vger.kernel.org
Reported-by: Diederik de Haas <diederik@cknow-tech.com>
Closes: https://lore.kernel.org/linux-crypto/DETXT7QI62KE.F3CGH2VWX1SC@cknow-tech.com/
Tested-by: Diederik de Haas <diederik@cknow-tech.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20251209223417.112294-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-10 09:46:26 -08:00
Ard Biesheuvel
6f7d948192 crypto/arm64: sm4/xts - Merge ksimd scopes to reduce stack bloat
Merge the two ksimd scopes in the implementation of SM4-XTS to prevent
stack bloat in cases where the compiler fails to combine the stack slots
for the kernel mode FP/SIMD buffers.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-6-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:21 -08:00
Ard Biesheuvel
a9a8b1a383 crypto/arm64: aes/xts - Use single ksimd scope to reduce stack bloat
The ciphertext stealing logic in the AES-XTS implementation creates a
separate ksimd scope to call into the FP/SIMD core routines, and in some
cases (CONFIG_KASAN_STACK is one, but there might be others), the 528
byte kernel mode FP/SIMD buffer that is allocated inside this scope is
not shared with the preceding ksimd scope, resulting in unnecessary
stack bloat.

Considering that

a) the XTS ciphertext stealing logic is never called for block
   encryption use cases, and XTS is rarely used for anything else,

b) in the vast majority of cases, the entire input block is processed
   during the first iteration of the loop,

we can combine both ksimd scopes into a single one with no practical
impact on how often/how long FP/SIMD is en/disabled, allowing us to
reuse the same stack slot for both FP/SIMD routine calls.

Fixes: ba3c1b3b5a ("crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-5-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:21 -08:00
Eric Biggers
5dc8d27752 Shared tag/branch for arm64 FP/SIMD changes going through libcrypto
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCaRRMDQAKCRAwbglWLn0t
 XFHxAQDbFpxGYEfGk+x8YdbThNLhPgzc0kKazpG24YGiQxInyAD/b18m4mm8z/Ph
 JzFq1lYWQk6RQkebtjnzdUGK0c1drQw=
 =qDhO
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fpsimd-on-stack-for-v6.19' into libcrypto-fpsimd-on-stack

Pull fpsimd-on-stack changes from Ard Biesheuvel:

  "Shared tag/branch for arm64 FP/SIMD changes going through libcrypto"

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12 10:15:07 -08:00
Ard Biesheuvel
03bc4768fb crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principle, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
ab9615b501 crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principle, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
a6b4084455 crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
931ceb5785 crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
72cb51233b crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
87c9b04e71 crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:02 +01:00
Ard Biesheuvel
ba3c1b3b5a crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:01 +01:00
Ard Biesheuvel
b044c7e4c7 crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:01 +01:00
Ard Biesheuvel
c13aebfeee crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it when calling APIs that may sleep.

Also, move the calls to kernel_neon_end() to the same scope as
kernel_neon_begin(). This is needed for a subsequent change where a
stack buffer is allocated transparently and passed to
kernel_neon_begin().

While at it, simplify the logic.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:01 +01:00
Ard Biesheuvel
9520ef3771 crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it when calling APIs that may sleep.

Also, move the calls to kernel_neon_end() to the same scope as
kernel_neon_begin(). This is needed for a subsequent change where a
stack buffer is allocated transparently and passed to
kernel_neon_begin().

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:01 +01:00
Ard Biesheuvel
e9426f3e6b crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it explicitly in order to prevent scheduling latency
spikes.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:52:01 +01:00
Eric Biggers
37919e239e lib/crypto: arm64/polyval: Migrate optimized code into library
Migrate the arm64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on arm64.

This drops the arm64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:03:38 -08:00
Eric Biggers
1e29a75057 lib/crypto: arm64/sha3: Migrate optimized code into library
Instead of exposing the arm64-optimized SHA-3 code via arm64-specific
crypto_shash algorithms, instead just implement the sha3_absorb_blocks()
and sha3_keccakf() library functions.  This is much simpler, it makes
the SHA-3 library functions be arm64-optimized, and it fixes the
longstanding issue where the arm64-optimized SHA-3 code was disabled by
default.  SHA-3 still remains available through crypto_shash, but
individual architectures no longer need to handle it.

Note: to see the diff from arch/arm64/crypto/sha3-ce-glue.c to
lib/crypto/arm64/sha3.h, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
Eric Biggers
be755eb2b0 crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
- Use size_t lengths, to match the library.

- Pass the block size instead of digest size, and add support for the
  block size that SHAKE128 uses.  This allows the code to be used with
  SHAKE128 and SHAKE256, which don't have the concept of a digest size.
  SHAKE256 has the same block size as SHA3-256, but SHAKE128 has a
  unique block size.  Thus, there are now 5 supported block sizes.

Don't bother changing the "glue" code arm64_sha3_update() too much, as
it gets deleted when the SHA-3 code is migrated into lib/crypto/ anyway.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
David Howells
4141211903 crypto: arm64/sha3 - Rename conflicting function
Rename the arm64 sha3_update() function to have an "arm64_" prefix to
avoid a name conflict with the upcoming SHA-3 library.

Note: this code will be superseded later.  This commit simply keeps the
kernel building for the initial introduction of the library.

[EB: dropped unnecessary rename of sha3_finup(), and improved commit
     message]

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-03 09:10:58 -08:00
Eric Biggers
01834444d9 crypto: arm64/aes - use SHA-256 library instead of crypto_shash
In essiv_cbc_set_key(), just use the SHA-256 library instead of
crypto_shash.  This is simpler and also slightly faster.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-08-30 15:43:25 +08:00
Eric Biggers
00d549bb89 lib/crypto: arm64/sha1: Migrate optimized code into library
Instead of exposing the arm64-optimized SHA-1 code via arm64-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function.  This is much simpler, it makes the SHA-1 library
functions be arm64-optimized, and it fixes the longstanding issue where
the arm64-optimized SHA-1 code was disabled by default.  SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.

Remove support for SHA-1 finalization from assembly code, since the
library does not yet support architecture-specific overrides of the
finalization.  (Support for that has been omitted for now, for
simplicity and because usually it isn't performance-critical.)

To match sha1_blocks(), change the type of the nblocks parameter and the
return value of __sha1_ce_transform() from int to size_t.  Update the
assembly code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:11:48 -07:00
Eric Biggers
60e3f1e9b7 lib/crypto: arm64/sha512: Migrate optimized SHA-512 code to library
Instead of exposing the arm64-optimized SHA-512 code via arm64-specific
crypto_shash algorithms, instead just implement the sha512_blocks()
library function.  This is much simpler, it makes the SHA-512 (and
SHA-384) library functions be arm64-optimized, and it fixes the
longstanding issue where the arm64-optimized SHA-512 code was disabled
by default.  SHA-512 still remains available through crypto_shash, but
individual architectures no longer need to handle it.

To match sha512_blocks(), change the type of the nblocks parameter of
the assembly functions from int or 'unsigned int' to size_t.  Update the
ARMv8 CE assembly function accordingly.  The scalar assembly function
actually already treated it as size_t.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160320.2888-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:26:19 -07:00
Eric Biggers
e0fca17755 crypto: sha512 - Rename conflicting symbols
Rename existing functions and structs in architecture-optimized SHA-512
code that had names conflicting with the upcoming library interface
which will be added to <crypto/sha2.h>: sha384_init, sha512_init,
sha512_update, sha384, and sha512.

Note: all affected code will be superseded by later commits that migrate
the arch-optimized SHA-512 code into the library.  This commit simply
keeps the kernel building for the initial introduction of the library.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160320.2888-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:26:19 -07:00
Herbert Xu
adcb9e32e5 crypto: arm64/sha256 - Add simd block function
Add CRYPTO_ARCH_HAVE_LIB_SHA256_SIMD and a SIMD block function
so that the caller can decide whether to use SIMD.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:45 +08:00
Eric Biggers
6e36be511d crypto: arm64/sha256 - implement library instead of shash
Instead of providing crypto_shash algorithms for the arch-optimized
SHA-256 code, instead implement the SHA-256 library.  This is much
simpler, it makes the SHA-256 library functions be arch-optimized, and
it fixes the longstanding issue where the arch-optimized SHA-256 was
disabled by default.  SHA-256 still remains available through
crypto_shash, but individual architectures no longer need to handle it.

Remove support for SHA-256 finalization from the ARMv8 CE assembly code,
since the library does not yet support architecture-specific overrides
of the finalization.  (Support for that has been omitted for now, for
simplicity and because usually it isn't performance-critical.)

To match sha256_blocks_arch(), change the type of the nblocks parameter
of the assembly functions from int or 'unsigned int' to size_t.  Update
the ARMv8 CE assembly function accordingly.  The scalar and NEON
assembly functions actually already treated it as size_t.

While renaming the assembly files, also fix the naming quirks where
"sha2" meant sha256, and "sha512" meant both sha256 and sha512.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:43 +08:00
Eric Biggers
642cfc0680 crypto: arm64/sha256 - remove obsolete chunking logic
Since kernel-mode NEON sections are now preemptible on arm64, there is
no longer any need to limit the length of them.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:43 +08:00
Herbert Xu
d5a582a782 crypto: arm64/polyval - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:54 +08:00
Eric Biggers
cc16e228a2 crypto: arm64 - move library functions to arch/arm64/lib/crypto/
Continue disentangling the crypto library functions from the generic
crypto infrastructure by moving the arm64 ChaCha and Poly1305 library
functions into a new directory arch/arm64/lib/crypto/ that does not
depend on CRYPTO.  This mirrors the distinction between crypto/ and
lib/crypto/.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:53 +08:00
Eric Biggers
e2df5fb770 crypto: arm64 - drop redundant dependencies on ARM64
arch/arm64/crypto/Kconfig is sourced only when CONFIG_ARM64=y, so there
is no need for the symbols defined inside it to depend on ARM64.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:53 +08:00
Herbert Xu
432f98cf56 crypto: arm64/sha1 - Set finalize for short finup
Always set sctx->finalize before calling finup as it may not have
been set previously on a short final.

Reported-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Fixes: b97d31100e ("crypto: arm64/sha1 - Use API partial block handling")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-26 19:12:24 +08:00
Herbert Xu
ef17008481 crypto: arm64/sm4 - Use API partial block handling
Use the Crypto API partial block handling.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:47 +08:00
Herbert Xu
4dc3c40c4d crypto: arm64/aes - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:47 +08:00
Herbert Xu
045f17b444 crypto: arm64/sm3-neon - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:47 +08:00
Herbert Xu
6ba8c5f5a4 crypto: arm64/sm3-ce - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:47 +08:00