linux

github-mirror/linux

Fork 0

mirror of https://github.com/torvalds/linux.git synced 2026-06-03 12:03:54 +02:00

Commit Graph

Author	SHA1	Message	Date
Ard Biesheuvel	e0718ed60d	lib/crc: arm64: Drop unnecessary chunking logic from crc64 On arm64, kernel mode NEON executes with preemption enabled, so there is no need to chunk the input by hand. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-04-02 16:14:53 -07:00
Demian Shulhan	63432fd625	lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems. The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability. Key highlights of this implementation: - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers. - Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling. - Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128. Performance results (kunit crc_benchmark on Cortex-A72): - Generic (len=4096): ~268 MB/s - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement) Signed-off-by: Demian Shulhan <demyansh@gmail.com> Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-03-29 13:22:13 -07:00

Author

SHA1

Message

Date

Ard Biesheuvel

e0718ed60d

lib/crc: arm64: Drop unnecessary chunking logic from crc64

On arm64, kernel mode NEON executes with preemption enabled, so there is
no need to chunk the input by hand.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

2026-04-02 16:14:53 -07:00

Demian Shulhan

63432fd625

lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation

Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
software implementation is slow, which creates a bottleneck in NVMe and
other storage subsystems.

The acceleration is implemented using C intrinsics (<arm_neon.h>) rather
than raw assembly for better readability and maintainability.

Key highlights of this implementation:
- Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
  spikes on large buffers.
- Pre-calculates and loads fold constants via vld1q_u64() to minimize
  register spilling.
- Benchmarks show the break-even point against the generic implementation
  is around 128 bytes. The PMULL path is enabled only for len >= 128.

Performance results (kunit crc_benchmark on Cortex-A72):
- Generic (len=4096): ~268 MB/s
- PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)

Signed-off-by: Demian Shulhan <demyansh@gmail.com>
Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

2026-03-29 13:22:13 -07:00

2 Commits