Commit Graph

947 Commits

Author SHA1 Message Date
Linus Torvalds
aec2f682d4 This update includes the following changes:
API:
 
 - Replace crypto_get_default_rng with crypto_stdrng_get_bytes.
 - Remove simd skcipher support.
 - Allow algorithm types to be disabled when CRYPTO_SELFTESTS is off.
 
 Algorithms:
 
 - Remove CPU-based des/3des acceleration.
 - Add test vectors for authenc(hmac(md5),cbc(aes)).
 - Add test vectors for authenc(hmac(md5),cbc(des)).
 - Add test vectors for authenc(hmac(md5),rfc3686(ctr(aes))).
 - Add test vectors for authenc(hmac(sha1),rfc3686(ctr(aes))).
 - Add test vectors for authenc(hmac(sha224),rfc3686(ctr(aes))).
 - Add test vectors for authenc(hmac(sha256),rfc3686(ctr(aes))).
 - Add test vectors for authenc(hmac(sha384),rfc3686(ctr(aes))).
 - Add test vectors for authenc(hmac(sha512),rfc3686(ctr(aes))).
 - Replace spin lock with mutex in jitterentropy.
 
 Drivers:
 
 - Add authenc algorithms to safexcel.
 - Add support for zstd in qat.
 - Add wireless mode support for QAT GEN6.
 - Add anti-rollback support for QAT GEN6.
 - Add support for ctr(aes), gcm(aes), and ccm(aes) in dthev2.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmne7qgACgkQxycdCkmx
 i6cm0w/9HNFzIWuZWh4Q8k1d/SX32/2p40EMvlw9QFO8wt0gsMtbk6NN5G3sIfhL
 36+rT8Vo5yg9MahTqAspXKjP+QTev5D7/nsDa/FzOSA1JxyvBbgV7X33k8EZjcgT
 +ffuh0WbaWlutYw07o2h4cNPz1Yp4M0hp2IdzvY0Y3q9D05eiwis1SQzUVPmTs6K
 I6OP+4JjJbqubOgJxsltEoeCH9ZP0fObRWmAiVm6rwk9uX4CY32nzi3QOttXQ0su
 4F/useoRwWQ1t7FTy8/fcVtFpL/G8hAFSQ4un5ODhDWL7taV5sZPXQBwXUuoVQM6
 aNjZlaju/MB7gnAOrBvSsniohAAqRUNR8O7P8QW6mDrFmDhUZ3ZILmCKW+VwF5SG
 a4fV94XgBVOnKIqD01cc++8mb6keX/88KJW79AEWLeJ9YZ9BuyFphr9OEBFAIHqx
 xG+iEg4uoVxwC52//oGt/yZaZKK3C1y/Zey5bOjfErKq3ATXGIvawaAzdvB9mh6Q
 iAnl71JpR4mrs++fAyUCKM+dfvdmQYDq6HJayMdg+IHAIeIvyMnPjsGigdVJvE65
 RpBKW4aclfiYaDwX9Jf703mHR1uuKGP1GKpz8U+JXN4Ax2JPg0maC1N3wFkDypYO
 HUNKgEk/173f1HTjU0JjbqvqJh+rKQ3ZbHpLxZrYtnSMukDwRO0=
 =KoAB
 -----END PGP SIGNATURE-----

Merge tag 'v7.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Replace crypto_get_default_rng with crypto_stdrng_get_bytes
   - Remove simd skcipher support
   - Allow algorithm types to be disabled when CRYPTO_SELFTESTS is off

  Algorithms:
   - Remove CPU-based des/3des acceleration
   - Add test vectors for authenc(hmac(md5),cbc({aes,des})) and
     authenc(hmac({md5,sha1,sha224,sha256,sha384,sha512}),rfc3686(ctr(aes)))
   - Replace spin lock with mutex in jitterentropy

  Drivers:
   - Add authenc algorithms to safexcel
   - Add support for zstd in qat
   - Add wireless mode support for QAT GEN6
   - Add anti-rollback support for QAT GEN6
   - Add support for ctr(aes), gcm(aes), and ccm(aes) in dthev2"

* tag 'v7.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (129 commits)
  crypto: af_alg - use sock_kmemdup in alg_setkey_by_key_serial
  crypto: vmx - remove CRYPTO_DEV_VMX from Kconfig
  crypto: omap - convert reqctx buffer to fixed-size array
  crypto: atmel-sha204a - add Thorsten Blum as maintainer
  crypto: atmel-ecc - add Thorsten Blum as maintainer
  crypto: qat - fix IRQ cleanup on 6xxx probe failure
  crypto: geniv - Remove unused spinlock from struct aead_geniv_ctx
  crypto: qce - simplify qce_xts_swapiv()
  crypto: hisilicon - Fix dma_unmap_single() direction
  crypto: talitos - rename first/last to first_desc/last_desc
  crypto: talitos - fix SEC1 32k ahash request limitation
  crypto: jitterentropy - replace long-held spinlock with mutex
  crypto: hisilicon - remove unused and non-public APIs for qm and sec
  crypto: hisilicon/qm - drop redundant variable initialization
  crypto: hisilicon/qm - remove else after return
  crypto: hisilicon/qm - add const qualifier to info_name in struct qm_cmd_dump_item
  crypto: hisilicon - fix the format string type error
  crypto: ccree - fix a memory leak in cc_mac_digest()
  crypto: qat - add support for zstd
  crypto: qat - use swab32 macro
  ...
2026-04-15 15:22:26 -07:00
Eric Biggers
9a73869cb5 crypto: x86 - Remove des and des3_ede code
Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-04-03 08:56:12 +08:00
Eric Biggers
17ba6108d3 lib/crypto: x86/sm3: Migrate optimized code into library
Instead of exposing the x86-optimized SM3 code via an x86-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function.  This is much simpler, it makes the SM3 library functions be
x86-optimized, and it fixes the longstanding issue where the
x86-optimized SM3 code was disabled by default.  SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Tweak the prototype of sm3_transform_avx() to match what the library
expects, including changing the block count to size_t.  Note that the
assembly code actually already treated this argument as size_t.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 17:50:59 -07:00
Eric Biggers
ea0c746ffa lib/crypto: aesgcm: Use GHASH library API
Make the AES-GCM library use the GHASH library instead of directly
calling gf128mul_lle().  This allows the architecture-optimized GHASH
implementations to be used, or the improved generic implementation if no
architecture-optimized implementation is usable.

Note: this means that <crypto/gcm.h> no longer needs to include
<crypto/gf128mul.h>.  Remove that inclusion, and include
<crypto/gf128mul.h> explicitly from arch/x86/crypto/aesni-intel_glue.c
which previously was relying on the transitive inclusion.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-20-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 16:44:30 -07:00
Eric Biggers
3e79c8ec49 lib/crypto: x86/ghash: Migrate optimized code into library
Remove the "ghash-pclmulqdqni" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.

This makes the GHASH library be optimized with x86's carryless
multiplication instructions.  It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.

Rename and adjust the prototypes of the assembly functions to make them
fit better with the library.  Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.

Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication.  Wire it up to both
ghash_mul_arch() and polyval_mul_arch().

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23 16:44:29 -07:00
Eric Biggers
104a9526e1 crypto: x86/aes-gcm - Use new AES library API
Switch from the old AES library functions (which use struct
crypto_aes_ctx) to the new ones (which use struct aes_enckey).  This
eliminates the unnecessary computation and caching of the decryption
round keys.  The new AES en/decryption functions are also much faster
and use AES instructions when supported by the CPU.

Since this changes the format of the AES-GCM key structures that are
used by the AES-GCM assembly code, the offsets in the assembly code had
to be updated to match.  Note that the new key structures are smaller,
since the decryption round keys are no longer unnecessarily included.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-26-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15 14:09:08 -08:00
Eric Biggers
9c941c94bc crypto: x86/aes - Remove the superseded AES-NI crypto_cipher
Remove the "aes-aesni" crypto_cipher algorithm and the code specific to
its implementation.  It is no longer necessary because the AES library
is now optimized with x86 AES-NI, and crypto/aes.c exposes the AES
library via the crypto_cipher API.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-19-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15 14:09:07 -08:00
Eric Biggers
a229d83235 lib/crypto: x86/nh: Migrate optimized code into library
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Linus Torvalds
a619fe35ab This update includes the following changes:
API:
 
 - Rewrite memcpy_sglist from scratch.
 - Add on-stack AEAD request allocation.
 - Fix partial block processing in ahash.
 
 Algorithms:
 
 - Remove ansi_cprng.
 - Remove tcrypt tests for poly1305.
 - Fix EINPROGRESS processing in authenc.
 - Fix double-free in zstd.
 
 Drivers:
 
 - Use drbg ctr helper when reseeding xilinx-trng.
 - Add support for PCI device 0x115A to ccp.
 - Add support of paes in caam.
 - Add support for aes-xts in dthev2.
 
 Others:
 
 - Use likely in rhashtable lookup.
 - Fix lockdep false-positive in padata by removing a helper.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmktaHwACgkQxycdCkmx
 i6duthAAl4ZjsuSgt0P9ZPJXWgSH+QbNT/6fL1QzLEuzLVGn8Mt99LTQpaYU8HRh
 fced8+R7UpqA/FgZTYbRKopZJVJJqhmTf2zqjbe47CroRm2Wf5UO+6ZXBsiqbMwa
 6fNLilhcrq5G3DrIHepCpIQ7NM2+ucTMnPRIWP3cvzLwX0JzPtYIpYUSiVPAtkjh
 9g24oPz6LR/xZfyk+wPbHOSYeqz4sSXnGJkL+Vn33AtU5KJZLum9zMP4Lleim7HP
 XaNnUL/S/PYCspycrvfrnq6+YMLPw2USguttuZe0Dg0qhq/jPMyzdEkTAjcTD5LG
 NZavVUbQsf6BW+YjXgaE/ybcSs6WR3ySs8aza1Ev8QqsmpbJj9xdpF9fn4RsffGR
 mbhc5plJCKWzfiaparea8yY9n5vHwbOK4zoyF9P6kI5ykkoA+GmwRwTW73M9KCfa
 i1R6g97O+t4Yaq9JI9GG7dkm9bxJpY+XaKouW7rqv/MX0iND1ExDYaqdcA+Xa61c
 TNfdlVcGyX7Dolm2xnpvRv8EqF9NzeK4Vw1QslrdCijXfe7eJymabNKhLBlV4li0
 tVfmh4vyQFgruyiR7r7AkXIKzsLZbji030UoOsQqiMW7ualBUQ0dCDbBa8J6kUcX
 /vjbSmxV3LKgVgYvUBRRGIi9CJbKfs29RkS6RFtdqcq/YT4KsJU=
 =DHes
 -----END PGP SIGNATURE-----

Merge tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Rewrite memcpy_sglist from scratch
   - Add on-stack AEAD request allocation
   - Fix partial block processing in ahash

  Algorithms:
   - Remove ansi_cprng
   - Remove tcrypt tests for poly1305
   - Fix EINPROGRESS processing in authenc
   - Fix double-free in zstd

  Drivers:
   - Use drbg ctr helper when reseeding xilinx-trng
   - Add support for PCI device 0x115A to ccp
   - Add support of paes in caam
   - Add support for aes-xts in dthev2

  Others:
   - Use likely in rhashtable lookup
   - Fix lockdep false-positive in padata by removing a helper"

* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
  crypto: zstd - fix double-free in per-CPU stream cleanup
  crypto: ahash - Zero positive err value in ahash_update_finish
  crypto: ahash - Fix crypto_ahash_import with partial block data
  crypto: lib/mpi - use min() instead of min_t()
  crypto: ccp - use min() instead of min_t()
  hwrng: core - use min3() instead of nested min_t()
  crypto: aesni - ctr_crypt() use min() instead of min_t()
  crypto: drbg - Delete unused ctx from struct sdesc
  crypto: testmgr - Add missing DES weak and semi-weak key tests
  Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
  crypto: scatterwalk - Fix memcpy_sglist() to always succeed
  crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
  crypto: tcrypt - Remove unused poly1305 support
  crypto: ansi_cprng - Remove unused ansi_cprng algorithm
  crypto: asymmetric_keys - fix uninitialized pointers with free attribute
  KEYS: Avoid -Wflex-array-member-not-at-end warning
  crypto: ccree - Correctly handle return of sg_nents_for_len
  crypto: starfive - Correctly handle return of sg_nents_for_len
  crypto: iaa - Fix incorrect return value in save_iaa_wq()
  crypto: zstd - Remove unnecessary size_t cast
  ...
2025-12-03 11:28:38 -08:00
Linus Torvalds
8f4c9978de AES-GCM optimizations for 6.19
More optimizations and cleanups for the x86_64 AES-GCM code:
 
 - Add a VAES+AVX2 optimized implementation of AES-GCM. This is very
   helpful on CPUs that have VAES but not AVX512, such as AMD Zen 3.
 
 - Make the VAES+AVX512 optimized implementation of AES-GCM handle
   large amounts of associated data efficiently.
 
 - Remove the "avx10_256" implementation of AES-GCM. It's superseded by
   the VAES+AVX2 optimized implementation.
 
 - Rename the "avx10_512" implementation to "avx512".
 
 Overall, this fills in a gap where AES-GCM wasn't fully optimized on
 some recent CPUs. It also drops code that won't be as useful as
 initially expected due to AVX10/256 being dropped from the AVX10 spec.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaSusExQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK3v0APsEM8dfSm2CrTtRptdho1zfGeumsvXV
 TpCM+xoTwOd8WQEAmCvDo0AOfqUGBp+1eXwbMDecKw8Hy+WuGd9YSt9onQQ=
 =E/nv
 -----END PGP SIGNATURE-----

Merge tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull AES-GCM optimizations from Eric Biggers:
 "More optimizations and cleanups for the x86_64 AES-GCM code:

   - Add a VAES+AVX2 optimized implementation of AES-GCM. This is very
     helpful on CPUs that have VAES but not AVX512, such as AMD Zen 3.

   - Make the VAES+AVX512 optimized implementation of AES-GCM handle
     large amounts of associated data efficiently.

   - Remove the "avx10_256" implementation of AES-GCM. It's superseded
     by the VAES+AVX2 optimized implementation.

   - Rename the "avx10_512" implementation to "avx512"

  Overall, this fills in a gap where AES-GCM wasn't fully optimized on
  some recent CPUs. It also drops code that won't be as useful as
  initially expected due to AVX10/256 being dropped from the AVX10 spec"

* tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  crypto: x86/aes-gcm-vaes-avx2 - initialize full %rax return register
  crypto: x86/aes-gcm - optimize long AAD processing with AVX512
  crypto: x86/aes-gcm - optimize AVX512 precomputation of H^2 from H^1
  crypto: x86/aes-gcm - revise some comments in AVX512 code
  crypto: x86/aes-gcm - reorder AVX512 precompute and aad_update functions
  crypto: x86/aes-gcm - clean up AVX512 code to assume 512-bit vectors
  crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512
  crypto: x86/aes-gcm - remove VAES+AVX10/256 optimized code
  crypto: x86/aes-gcm - add VAES+AVX2 optimized code
2025-12-02 18:24:35 -08:00
David Laight
6c5d5b6dc5 crypto: aesni - ctr_crypt() use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:43:40 +08:00
Eric Biggers
4d8da35579 lib/crypto: x86/polyval: Migrate optimized code into library
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:03:38 -08:00
Eric Biggers
0e253e250e crypto: x86/aes-gcm-vaes-avx2 - initialize full %rax return register
Update aes_gcm_dec_final_vaes_avx2() to be consistent with
aes_gcm_dec_final_aesni() and aes_gcm_dec_final_vaes_avx512() by
initializing the full %rax return register instead of just %al.
Technically this is unnecessary, since these functions return bool.  But
I think it's worth being extra careful with the result of the tag
comparison and also keeping the different implementations consistent.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102015256.171536-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-03 09:07:57 -08:00
Eric Biggers
05794985b1 crypto: x86/aes-gcm - optimize long AAD processing with AVX512
Improve the performance of aes_gcm_aad_update_vaes_avx512() on large AAD
(additional authenticated data) lengths by 4-8 times by making it use up
to 512-bit vectors and a 4-vector-wide loop.  Previously, it used only
256-bit vectors and a 1-vector-wide loop.

Originally, I assumed that the case of large AADLEN was unimportant.
Later, when reviewing the users of BoringSSL's AES-GCM code, I found
that some callers use BoringSSL's AES-GCM API to just compute GMAC,
authenticating lots of data but not en/decrypting any.  Thus, I included
a similar optimization in the BoringSSL port of this code.  I believe
it's wise to include this optimization in the kernel port too for
similar reasons, and to align it more closely with the BoringSSL port.

Another reason this function originally used 256-bit vectors was so that
separate *_avx10_256 and *_avx10_512 versions of it wouldn't be needed.
However, that's no longer applicable.

To avoid a slight performance regression in the common case of AADLEN <=
16, also add a fast path for that case which uses 128-bit vectors.  In
fact, this case actually gets slightly faster too, since it saves a
couple instructions over the original 256-bit code.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:41 -07:00
Eric Biggers
5ab1ff2e0f crypto: x86/aes-gcm - optimize AVX512 precomputation of H^2 from H^1
Squaring in GF(2^128) requires fewer instructions than a generic
multiplication in GF(2^128).  Take advantage of this when computing H^2
from H^1 in aes_gcm_precompute_vaes_avx512().

Note that aes_gcm_precompute_vaes_avx2() already uses this optimization.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:41 -07:00
Eric Biggers
e0abd0053f crypto: x86/aes-gcm - revise some comments in AVX512 code
- Fix some references to field names in struct aes_gcm_key_vaes_avx512.

- Remove the mention of the counter having to start at 2.  The assembly
  code doesn't actually assume that it does.

Note that these changes improve consistency with aes-gcm-vaes-avx2.S.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:41 -07:00
Eric Biggers
5213aefa9e crypto: x86/aes-gcm - reorder AVX512 precompute and aad_update functions
Now that the _aes_gcm_precompute macro is instantiated only once,
replace it directly with a function definition.

Also, move aes_gcm_aad_update_vaes_avx512() to a different location in
the file so that it's consistent with aes-gcm-vaes-avx2.S and also the
BoringSSL port of this code.

No functional changes.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:41 -07:00
Eric Biggers
4b582e0fb3 crypto: x86/aes-gcm - clean up AVX512 code to assume 512-bit vectors
aes-gcm-vaes-avx512.S (originally aes-gcm-avx10-x86_64.S) was designed
to support multiple maximum vector lengths, while still utilizing AVX512
/ AVX10 features such as the increased number of vector registers.
However, the support for multiple maximum vector lengths turned out to
not be useful.  Support for maximum vector lengths other than 512 bits
was removed from the AVX10 specification, which leaves "avoiding
overly-eager downclocking" as the only remaining use case for limiting
AVX512 / AVX10 code to 256-bit vectors.  But this issue has gone away in
new CPUs, and the separate VAES+AVX2 code which I ended up having to
write anyway provides nearly as good 256-bit support.

Therefore, clean up this code to not be written in terms of a generic
vector length, but rather just assume 512-bit vectors.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:41 -07:00
Eric Biggers
12beec21c5 crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512
With the "avx10_256" code removed and the AVX10 specification having
been changed to basically just be a re-packaged AVX512, the "avx10_512"
name no longer makes sense.  Replace it with "avx512".

While doing this, also add the "vaes_" prefix in places that didn't
already have it.  The result is that the two VAES optimized
implementations are consistently called vaes_avx2 and vaes_avx512.
(Also drop the "-x86_64" part of the assembly filename, to keep it from
getting too long.  There's no 32-bit version of this code, and the fact
that it's 64-bit is unremarkable; it's the norm for new code.)

Note: although aes_gcm_aad_update_vaes_avx512() (previously called
aes_gcm_aad_update_vaes_avx10()) uses at most 256-bit vectors, it still
depends on the AVX512 CPU feature.  So its new name is still accurate.
Also, a later commit will make it sometimes use 512-bit vectors anyway.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:40 -07:00
Eric Biggers
f65e908606 crypto: x86/aes-gcm - remove VAES+AVX10/256 optimized code
Remove the VAES+AVX10/256 optimized implementation of AES-GCM.

It's no longer expected to be useful for future CPUs, since Intel
changed the AVX10 specification to require 512-bit vectors.

In addition, it's no longer very useful to serve as the 256-bit fallback
for older Intel CPUs (Ice Lake and Tiger Lake) that downclock too
eagerly when 512-bit vectors are used.  This is because I ended up
writing another 256-bit implementation anyway, using VAES+AVX2.  The
VAES+AVX2 implementation is almost as fast as the VAES+AVX10/256 one, as
shown by the following tables.  So, let's just use it instead.

Table 1: AES-256-GCM encryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake Server |   -2% |   -1% |    0% |   -2% |   -2% |    3% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake Server |    1% |    0% |    4% |    2% |   -6% |

Table 2: AES-256-GCM decryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake Server |   -1% |   -1% |    1% |   -2% |    0% |    2% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake Server |   -1% |    4% |    1% |    0% |   -5% |

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:40 -07:00
Eric Biggers
fae3b96ba6 crypto: x86/aes-gcm - add VAES+AVX2 optimized code
Add an implementation of AES-GCM that uses 256-bit vectors and the
following CPU features: Vector AES (VAES), Vector Carryless
Multiplication (VPCLMULQDQ), and AVX2.

It doesn't require AVX512.  So unlike the existing VAES+AVX512 code, it
works on CPUs that support VAES but not AVX512, specifically:

    - AMD Zen 3, both client and server
    - Intel Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, and Lunar
      Lake.  (These are client CPUs.)
    - Intel Sierra Forest.  (This is a server CPU.)

On these CPUs, this VAES+AVX2 code is much faster than the existing
AES-NI code.  The AES-NI code uses only 128-bit vectors.

These CPUs are widely deployed, making VAES+AVX2 code worthwhile even
though hopefully future x86_64 CPUs will uniformly support AVX512.

This implementation will also serve as the fallback 256-bit
implementation for older Intel CPUs (Ice Lake and Tiger Lake) that
support AVX512 but downclock too eagerly when 512-bit vectors are used.
Currently, the VAES+AVX10/256 implementation serves that purpose.  A
later commit will remove that and just use the VAES+AVX2 one.  (Note
that AES-XTS and AES-CTR already successfully use this approach.)

I originally wrote this AES-GCM implementation for BoringSSL.  It's been
in BoringSSL for a while now, including in Chromium.  This is a port of
it to the Linux kernel.  The main changes in the Linux version include:

- Port from "perlasm" to a standard .S file.
- Align all assembly functions with what aesni-intel_glue.c expects,
  including adding support for lengths not a multiple of 16 bytes.
- Rework the en/decryption of the final 1 to 127 bytes.

This commit increases AES-256-GCM throughput on AMD Milan (Zen 3) by up
to 74%, as shown by the following tables:

Table 1: AES-256-GCM encryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   67% |   59% |   61% |   39% |   23% |   27% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   14% |   12% |    7% |    7% |    0% |

Table 2: AES-256-GCM decryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   74% |   65% |   65% |   44% |   23% |   26% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   12% |   11% |    3% |    2% |   -3% |

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-26 20:37:40 -07:00
Linus Torvalds
2f0a750453 - Simplify inline asm flag output operands now that the minimum compiler
version supports the =@ccCOND syntax
 
 - Remove a bunch of AS_* Kconfig symbols which detect assembler support for
   various instruction mnemonics now that the minimum assembler version
   supports them all
 
 - The usual cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjqQswACgkQEsHwGGHe
 VUoFQg/9EoQ8TnWyzdTQ83+4sy1ePIgY+WyRPlDPmyoAjGN1WTT1NUY2JBeaW5CA
 UVKJlaO2Nh/c5YypuJR2PtpPuJlNvRBLwpN3Lj+PiAhaYv8gcyeZg64c4MaRaTyc
 yuoj5CaEhyQ16CDBPAjxDQ6+68YHjltlDSZainj77YWSzcBSflJCYH1RnNlCHiM9
 ggBIoFmWltrCEDDW6d0Phl+Fh3K4tuYexRucIavgE+k4ZD+XqujWeLTaau837yW7
 CMvN16elGorWGRBGiaRGH2sbrh8ruYPw4lr5DlFl7ApoBmxgK9s9peicUHtHQz4H
 E9/c2XjGwVE4MtCI5IfeqG87DfojVeiWkXO30CMRalsFlbZzKs4JwalspIzgxH4s
 m2tsfN++y9eC1b4a8EaSVWBk03xmmNWM7FqjC3LOMyV0aI9dqj/u36aadHMC/GsL
 Rwl1GCnJnwu0Z7bho7L2qB0om4NOkX8H3uyzoOzDNC+RTKvgwumI0LpJBwrUrqW7
 Ftf7hIc52hj94drN2RsVtvu3ueBNJF8SW4VJ13UJyZyJDnB4Os2wrI9aJ1vBam1e
 md90pVVGjiXg/PhoCPDHPYzPs8oV2zNEJ0im/wNhkCH42yMAoIlbFDS77JghzSF2
 sI9vMJVsLN7y/SbiysejTBG83j1dEPIpkC7oSzkYOZNNjCKRWWo=
 =dW6J
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
  x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-11 10:51:14 -07:00
Eric Biggers
68546e5632 lib/crypto: curve25519: Consolidate into single module
Reorganize the Curve25519 library code:

- Build a single libcurve25519 module, instead of up to three modules:
  libcurve25519, libcurve25519-generic, and an arch-specific module.

- Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to
  lib/crypto/$(SRCARCH)/.  Centralize the build rules into
  lib/crypto/Makefile and lib/crypto/Kconfig.

- Include the arch-specific code directly in lib/crypto/curve25519.c via
  a header, rather than using a separate .c file.

- Eliminate the entanglement with CRYPTO.  CRYPTO_LIB_CURVE25519 no
  longer selects CRYPTO, and the arch-specific Curve25519 code no longer
  depends on CRYPTO.

This brings Curve25519 in line with the latest conventions for
lib/crypto/, used by other algorithms.  The exception is that I kept the
generic code in separate translation units for now.  (Some of the
function names collide between the x86 and generic Curve25519 code.  And
the Curve25519 functions are very long anyway, so inlining doesn't
matter as much for Curve25519 as it does for some other algorithms.)

Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:43 -07:00
Eric Biggers
de3ea8e1c5 crypto: x86/curve25519 - Remove unused kpp support
Curve25519 is used only via the library API, not the crypto_kpp API.  In
preparation for removing the unused crypto_kpp API for Curve25519,
remove the unused "curve25519-x86" kpp algorithm.

Note that the underlying x86_64 optimized Curve25519 code remains fully
supported and accessible via the library API.

It's also worth noting that even if the kpp support for Curve25519 comes
back later, there is no need for arch-specific kpp glue code like this,
as a single kpp algorithm that wraps the library API is sufficient.

Link: https://lore.kernel.org/r/20250906213523.84915-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 14:45:49 -07:00
Uros Bizjak
e084e9f815 crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
Current minimum required version of binutils is 2.30, which supports VPCLMULQDQ
instruction mnemonics.

Remove check for assembler support of VPCLMULQDQ instructions and all relevant
macros for conditional compilation.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/20250819085855.333380-3-ubizjak@gmail.com
2025-08-21 14:32:41 +02:00
Uros Bizjak
4593311290 crypto: X86 - Remove CONFIG_AS_VAES
Current minimum required version of binutils is 2.30, which supports VAES
instruction mnemonics.

Remove check for assembler support of VAES instructions and all relevant macros
for conditional compilation.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/20250819085855.333380-2-ubizjak@gmail.com
2025-08-21 12:23:28 +02:00
Uros Bizjak
a35da57357 crypto: x86 - Remove CONFIG_AS_GFNI
Current minimum required version of binutils is 2.30, which supports GFNI
instruction mnemonics.

Remove check for assembler support of GFNI instructions and all relevant
macros for conditional compilation.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/20250819085855.333380-1-ubizjak@gmail.com
2025-08-20 20:48:07 +02:00
Linus Torvalds
44a8c96edd This update includes the following changes:
API:
 
 - Allow hash drivers without fallbacks (e.g., hardware key).
 
 Algorithms:
 
 - Add hmac hardware key support (phmac) on s390.
 - Re-enable sha384 in FIPS mode.
 - Disable sha1 in FIPS mode.
 - Convert zstd to acomp.
 
 Drivers:
 
 - Lower priority of qat skcipher and aead.
 - Convert aspeed to partial block API.
 - Add iMX8QXP support in caam.
 - Add rate limiting support for GEN6 devices in qat.
 - Enable telemetry for GEN6 devices in qat.
 - Implement full backlog mode for hisilicon/sec2.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmiHQXwACgkQxycdCkmx
 i6f49A//dQtMg/nvlqForj3BTYKPtjpfZhGxOhda1Y01ts4nFLwM39HtNXGCa6no
 e5L/taHdGd4loZoFa0H7Jz8Qn+I8F3YJLE1gDmN1zogzM6hG7KwFpJLy+PrusS3H
 IwjUehPKNTK2XWmJCdxpsulmwBD+Y//DG3wpwGlkr+MMvlzoMkesvBSCwmXKh/rh
 dn8efrHqL+3LBM6F4nM5zTwcKpLvp7V9arwAE6Zat95WN1X2puEk9L8vYG96hU9/
 YmG79E6WIb4UBILJlYdfba+3tK0bZaU3iDHMLQVlAPgM8JvzF9THyFRlLRa586/P
 rHo2xrgD1vPlMFXKhNI9p+D65zF/5Z0EKTfn1Z99z1kVzz3L71GOYlAvcAw1S9/j
 dRAcfrs/7xEW1SI9j+jVYqZn5g/ClGF8MwEL2VYHzyxN3VPY7ELys4rk6Il29NQp
 EVH8VfZS3XmdF1oiH51/ZDT4mfvQjn3v33ssdNpAFsZX2XIBj0d48JtTN/ynDfUB
 SPS2pTa5FBJCOpRR/Pbct+eloyrVP4Lcy8/gwlKAEY0ZffBBPmd2wCujQf/SKcUH
 e4b6hXAWe0gns/4VSnaker3YdG6o4uPWotZKvIiyKlkKGmJXHuSRK32odRO66+Bg
 tlaUYOmRghmxgU9Sc6h9M6vkm5rBLMw4ccykmhGSvvudm9rLh6A=
 =E8nj
 -----END PGP SIGNATURE-----

Merge tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Allow hash drivers without fallbacks (e.g., hardware key)

  Algorithms:
   - Add hmac hardware key support (phmac) on s390
   - Re-enable sha384 in FIPS mode
   - Disable sha1 in FIPS mode
   - Convert zstd to acomp

  Drivers:
   - Lower priority of qat skcipher and aead
   - Convert aspeed to partial block API
   - Add iMX8QXP support in caam
   - Add rate limiting support for GEN6 devices in qat
   - Enable telemetry for GEN6 devices in qat
   - Implement full backlog mode for hisilicon/sec2"

* tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: keembay - Use min() to simplify ocs_create_linked_list_from_sg()
  crypto: hisilicon/hpre - fix dma unmap sequence
  crypto: qat - make adf_dev_autoreset() static
  crypto: ccp - reduce stack usage in ccp_run_aes_gcm_cmd
  crypto: qat - refactor ring-related debug functions
  crypto: qat - fix seq_file position update in adf_ring_next()
  crypto: qat - fix DMA direction for compression on GEN2 devices
  crypto: jitter - replace ARRAY_SIZE definition with header include
  crypto: engine - remove {prepare,unprepare}_crypt_hardware callbacks
  crypto: engine - remove request batching support
  crypto: qat - flush misc workqueue during device shutdown
  crypto: qat - enable rate limiting feature for GEN6 devices
  crypto: qat - add compression slice count for rate limiting
  crypto: qat - add get_svc_slice_cnt() in device data structure
  crypto: qat - add adf_rl_get_num_svc_aes() in rate limiting
  crypto: qat - relocate service related functions
  crypto: qat - consolidate service enums
  crypto: qat - add decompression service for rate limiting
  crypto: qat - validate service in rate limiting sysfs api
  crypto: hisilicon/sec2 - implement full backlog mode for sec
  ...
2025-07-31 09:45:28 -07:00
Eric Biggers
3d9eb180fb crypto: x86/aegis - Add missing error checks
The skcipher_walk functions can allocate memory and can fail, so
checking for errors is necessary.

Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-07-18 20:51:59 +10:00
Eric Biggers
c7f49dadfc crypto: x86/aegis - Fix sleeping when disallowed on PREEMPT_RT
skcipher_walk_done() can call kfree(), which takes a spinlock, which
makes it incorrect to call while preemption is disabled on PREEMPT_RT.
Therefore, end the kernel-mode FPU section before calling
skcipher_walk_done(), and restart it afterwards.

Moreover, pass atomic=false to skcipher_walk_aead_encrypt() instead of
atomic=true.  The point of atomic=true was to make skcipher_walk_done()
safe to call while in a kernel-mode FPU section, but that does not
actually work.  So just use the usual atomic=false.

Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-07-18 20:51:59 +10:00
Eric Biggers
f3d6cb3dc0 lib/crypto: x86/sha1: Migrate optimized code into library
Instead of exposing the x86-optimized SHA-1 code via x86-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function.  This is much simpler, it makes the SHA-1 library
functions be x86-optimized, and it fixes the longstanding issue where
the x86-optimized SHA-1 code was disabled by default.  SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.

To match sha1_blocks(), change the type of the nblocks parameter of the
assembly functions from int to size_t.  The assembly functions actually
already treated it as size_t.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:28:35 -07:00
Eric Biggers
56119446f8 crypto: x86/sha1 - Rename conflicting symbol
Rename x86's sha1_update() to sha1_update_x86(), since it conflicts with
the upcoming sha1_update() library function.

Note: the affected code will be superseded by later commits that migrate
the arch-optimized SHA-1 code into the library.  This commit simply
keeps the kernel building for the initial introduction of the library.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 08:22:31 -07:00
Eric Biggers
484c18119f lib/crypto: x86/sha512: Migrate optimized SHA-512 code to library
Instead of exposing the x86-optimized SHA-512 code via x86-specific
crypto_shash algorithms, instead just implement the sha512_blocks()
library function.  This is much simpler, it makes the SHA-512 (and
SHA-384) library functions be x86-optimized, and it fixes the
longstanding issue where the x86-optimized SHA-512 code was disabled by
default.  SHA-512 still remains available through crypto_shash, but
individual architectures no longer need to handle it.

To match sha512_blocks(), change the type of the nblocks parameter of
the assembly functions from int to size_t.  The assembly functions
actually already treated it as size_t.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160320.2888-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:26:20 -07:00
Eric Biggers
e0fca17755 crypto: sha512 - Rename conflicting symbols
Rename existing functions and structs in architecture-optimized SHA-512
code that had names conflicting with the upcoming library interface
which will be added to <crypto/sha2.h>: sha384_init, sha512_init,
sha512_update, sha384, and sha512.

Note: all affected code will be superseded by later commits that migrate
the arch-optimized SHA-512 code into the library.  This commit simply
keeps the kernel building for the initial introduction of the library.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160320.2888-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:26:19 -07:00
ChengZhenghan
d2b23a8dd8 crypto: x86 - Fix build warnings about export.h
I got some build warnings with W=1:
arch/x86/coco/sev/core.c:
arch/x86/crypto/aria_aesni_avx2_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/aria_aesni_avx_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/camellia_aesni_avx_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/camellia_glue.c: warning:
 EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/curve25519-x86_64.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/serpent_avx_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/sm4_aesni_avx_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/twofish_glue.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
arch/x86/crypto/twofish_glue_3way.c:
 warning: EXPORT_SYMBOL() is used,
 but #include <linux/export.h> is missing
so I fixed these build warnings for x86_64.

Signed-off-by: ChengZhenghan <chengzhenghan@uniontech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-06-23 16:59:38 +08:00
Eric Biggers
11d7956d52 crypto: x86/sha256 - implement library instead of shash
Instead of providing crypto_shash algorithms for the arch-optimized
SHA-256 code, instead implement the SHA-256 library.  This is much
simpler, it makes the SHA-256 library functions be arch-optimized, and
it fixes the longstanding issue where the arch-optimized SHA-256 was
disabled by default.  SHA-256 still remains available through
crypto_shash, but individual architectures no longer need to handle it.

To match sha256_blocks_arch(), change the type of the nblocks parameter
of the assembly functions from int to size_t.  The assembly functions
actually already treated it as size_t.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:44 +08:00
Herbert Xu
74df89ff76 crypto: x86/polyval - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:54 +08:00
Eric Biggers
c7c18c94a6 crypto: x86 - move library functions to arch/x86/lib/crypto/
Continue disentangling the crypto library functions from the generic
crypto infrastructure by moving the x86 BLAKE2s, ChaCha, and Poly1305
library functions into a new directory arch/x86/lib/crypto/ that does
not depend on CRYPTO.  This mirrors the distinction between crypto/ and
lib/crypto/.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:54 +08:00
Eric Biggers
67128a90b3 crypto: x86 - drop redundant dependencies on X86
arch/x86/crypto/Kconfig is sourced only when CONFIG_X86=y, so there is
no need for the symbols defined inside it to depend on X86.

In the case of CRYPTO_TWOFISH_586 and CRYPTO_TWOFISH_X86_64, the
dependency was actually on '(X86 || UML_X86)', which suggests that these
two symbols were intended to be available under user-mode Linux as well.
Yet, again these symbols were defined only when CONFIG_X86=y, so that
was not the case.  Just remove this redundant dependency.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:53 +08:00
Herbert Xu
68932c6be3 crypto: x86/sm3 - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:47 +08:00
Herbert Xu
ff3cb9de53 crypto: x86/sha512 - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:46 +08:00
Herbert Xu
8ba81fef40 crypto: sha256_base - Remove partial block helpers
Now that all sha256_base users have been converted to use the API
partial block handling, remove the partial block helpers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:46 +08:00
Herbert Xu
eba187a6e7 crypto: x86/sha256 - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 15:52:36 +08:00
Herbert Xu
0865a89413 crypto: x86/sha1 - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 11:33:47 +08:00
Herbert Xu
3942654223 crypto: x86/ghash - Use API partial block handling
Use the Crypto API partial block handling.

Also remove the unnecessary SIMD fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23 11:33:47 +08:00
Eric Biggers
bb9c648b33 crypto: lib/poly1305 - restore ability to remove modules
Though the module_exit functions are now no-ops, they should still be
defined, since otherwise the modules become unremovable.

Fixes: 1f81c58279 ("crypto: arm/poly1305 - remove redundant shash algorithm")
Fixes: f4b1a73aec ("crypto: arm64/poly1305 - remove redundant shash algorithm")
Fixes: 378a337ab4 ("crypto: powerpc/poly1305 - implement library instead of shash")
Fixes: 21969da642 ("crypto: x86/poly1305 - remove redundant shash algorithm")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-19 11:18:28 +08:00
Eric Biggers
8821d26926 crypto: lib/chacha - restore ability to remove modules
Though the module_exit functions are now no-ops, they should still be
defined, since otherwise the modules become unremovable.

Fixes: 08820553f3 ("crypto: arm/chacha - remove the redundant skcipher algorithms")
Fixes: 8c28abede1 ("crypto: arm64/chacha - remove the skcipher algorithms")
Fixes: f7915484c0 ("crypto: powerpc/chacha - remove the skcipher algorithms")
Fixes: ceba0eda83 ("crypto: riscv/chacha - implement library instead of skcipher")
Fixes: 632ab0978f ("crypto: x86/chacha - remove the skcipher algorithms")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-19 11:18:28 +08:00
Eric Biggers
34374f76af crypto: x86/poly1305 - don't select CRYPTO_LIB_POLY1305_GENERIC
The x86 Poly1305 code never falls back to the generic code, so selecting
CRYPTO_LIB_POLY1305_GENERIC is unnecessary.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16 15:36:25 +08:00
Eric Biggers
21969da642 crypto: x86/poly1305 - remove redundant shash algorithm
Since crypto/poly1305.c now registers a poly1305-$(ARCH) shash algorithm
that uses the architecture's Poly1305 library functions, individual
architectures no longer need to do the same.  Therefore, remove the
redundant shash algorithm from the arch-specific code and leave just the
library functions there.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16 15:36:25 +08:00
Eric Biggers
ecaa4be128 crypto: poly1305 - centralize the shash wrappers for arch code
Following the example of the crc32, crc32c, and chacha code, make the
crypto subsystem register both generic and architecture-optimized
poly1305 shash algorithms, both implemented on top of the appropriate
library functions.  This eliminates the need for every architecture to
implement the same shash glue code.

Note that the poly1305 shash requires that the key be prepended to the
data, which differs from the library functions where the key is simply a
parameter to poly1305_init().  Previously this was handled at a fairly
low level, polluting the library code with shash-specific code.
Reorganize things so that the shash code handles this quirk itself.

Also, to register the architecture-optimized shashes only when
architecture-optimized code is actually being used, add a function
poly1305_is_arch_optimized() and make each arch implement it.  Change
each architecture's Poly1305 module_init function to arch_initcall so
that the CPU feature detection is guaranteed to run before
poly1305_is_arch_optimized() gets called by crypto/poly1305.c.  (In
cases where poly1305_is_arch_optimized() just returns true
unconditionally, using arch_initcall is not strictly needed, but it's
still good to be consistent across architectures.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16 15:36:24 +08:00