Commit Graph

12 Commits

Author SHA1 Message Date
Linus Torvalds
ff57d59200 LoongArch changes for v7.1
1, Adjust build infrastructure for 32BIT/64BIT;
 2, Add HIGHMEM (PKMAP and FIX_KMAP) support;
 3, Show and handle CPU vulnerabilites correctly;
 4, Batch the icache maintenance for jump_label;
 5, Add more atomic instructions support for BPF JIT;
 6, Add more features (e.g. fsession) support for BPF trampoline;
 7, Some bug fixes and other small changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmnpwWgWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeiAXD/0RSRhj2y8LYGhVSPStMgN4uwMl
 1ylbkRg0biTvV0g8sD1R3MQ58/tKBZY5wTeLjwT50rl+JgOqVdrN6OMAxjwOKzJ6
 7C0rgpxBG5/YHI93paFVIYszsiWhRQaB5qfZCUOr230ZDJzvnfF1aH6JLybeHoMp
 HvERNURQsRbZo9yc69YxhrmHETEbum37u9hsrY5mJSEs5Fh+QxvTSYjE36z3Dtal
 YFqopTCaBgAhVw6BldVAcyvGvVK+d6iQEA035jObNLKKReNkwsQixxgnJhDSkbbG
 Z3md+hWp+YQQElGIP5q6+rj1rJZGrs/XL3HAnTQfXN+8bXIUO9AOf2/l5f9fZx7o
 2Vtt8n2/vVdzsVnKiHXGtsZ5uXrw4/kLiMZSCrUMZCtEOxJV9mmrVskPeie0iq0/
 nDG9uCgRldL8Xpg7d5NM9coECui3J+ztNkv06tL/JLm02bJPuqNwt3FeA1T/aH1c
 l2Hpw3Xuzl7lYuAYoa5CMm4X6yD/RA6w44pW1NKnb6j6llIOk6V6NwcwggWUnqht
 oB5VIqPKMOYjZ+fLurI2o9VWqWokJxDdzyrHhXyaG0JRK9Vak06C8UI5BQuosu88
 9WBoxK77PyNa60m56C32OZ5tu4UoPT8PgZWXDQDwn82SWzuYKWRruS2ng5A/JF7r
 H2Ez4iBjs2/P7vTQHA==
 =FiFl
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Adjust build infrastructure for 32BIT/64BIT

 - Add HIGHMEM (PKMAP and FIX_KMAP) support

 - Show and handle CPU vulnerabilites correctly

 - Batch the icache maintenance for jump_label

 - Add more atomic instructions support for BPF JIT

 - Add more features (e.g. fsession) support for BPF trampoline

 - Some bug fixes and other small changes

* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
  selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
  LoongArch: BPF: Add fsession support for trampolines
  LoongArch: BPF: Introduce emit_store_stack_imm64() helper
  LoongArch: BPF: Support up to 12 function arguments for trampoline
  LoongArch: BPF: Support small struct arguments for trampoline
  LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
  LoongArch: BPF: Support load-acquire and store-release instructions
  LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
  LoongArch: BPF: Add the default case in emit_atomic() and rename it
  LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
  LoongArch: Batch the icache maintenance for jump_label
  LoongArch: Add flush_icache_all()/local_flush_icache_all()
  LoongArch: Add spectre boundry for syscall dispatch table
  LoongArch: Show CPU vulnerabilites correctly
  LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
  LoongArch: Use get_random_canary() for stack canary init
  LoongArch: Improve the logging of disabling KASLR
  LoongArch: Align FPU register state to 32 bytes
  LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
  LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
  ...
2026-04-24 09:54:45 -07:00
Huacai Chen
3d9aba6618 LoongArch: Adjust build infrastructure for 32BIT/64BIT
Adjust build infrastructure (Kconfig, Makefile and ld scripts) to let
us enable both 32BIT/64BIT kernel build.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-22 15:44:26 +08:00
Christoph Hellwig
033bee3e49 loongarch: move the XOR code to lib/raid/
Move the optimized XOR into lib/raid and include it it in xor.ko instead
of always building it into the main kernel image.

Link: https://lkml.kernel.org/r/20260327061704.3707577-15-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02 23:36:18 -07:00
Eric Biggers
b10d2d20d9 lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
Move the loongarch-optimized CRC code from arch/loongarch/lib/crc* into
its new location in lib/crc/loongarch/, and wire it up in the new way.
This new way of organizing the CRC code eliminates the need to
artificially split the code for each CRC variant into separate arch and
generic modules, enabling better inlining and dead code elimination.
For more details, see "lib/crc: Prepare for arch-optimized code in
subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
72f51a4f4b loongarch/crc32: expose CRC32 functions through lib
Move the loongarch CRC32 assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/loongarch/crypto/crc32-loongarch.c to
arch/loongarch/lib/crc32-loongarch.c, view this commit with
'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: WangYuli <wangyuli@uniontech.com>
Link: https://lore.kernel.org/r/20241202010844.144356-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:01 -08:00
Xi Ruoyao
5125d033c8 LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
This allows compiling a full 128-bit product of two 64-bit integers as a
mul/mulh pair, instead of a nasty long sequence of 20+ instructions.

However, after selecting ARCH_SUPPORTS_INT128, when optimizing for size
the compiler generates calls to __ashlti3, __ashrti3, and __lshrti3 for
shifting __int128 values, causing a link failure:

    loongarch64-unknown-linux-gnu-ld: kernel/sched/fair.o: in
    function `mul_u64_u32_shr':
    <PATH>/include/linux/math64.h:161:(.text+0x5e4): undefined
    reference to `__lshrti3'

So provide the implementation of these functions if ARCH_SUPPORTS_INT128.

Closes: https://lore.kernel.org/loongarch/CAAhV-H5EZ=7OF7CSiYyZ8_+wWuenpo=K2WT8-6mAT4CvzUC_4g@mail.gmail.com/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-14 12:24:18 +08:00
WANG Xuerui
75ded18a5e LoongArch: Add SIMD-optimized XOR routines
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

> 8regs           : 12702 MB/sec
> 8regs_prefetch  : 10920 MB/sec
> 32regs          : 12686 MB/sec
> 32regs_prefetch : 10918 MB/sec
> lsx             : 17589 MB/sec
> lasx            : 26116 MB/sec

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
Tiezhu Yang
8b5ee2c66d LoongArch: Add support for function error injection
Inspired by the commit 42d038c4fb ("arm64: Add support for function
error injection") and the commit ee55ff803b ("riscv: Add support for
function error injection"), this patch supports function error injection
for LoongArch.

Mainly implement two functions:
(1) regs_set_return_value() which is used to overwrite the return value,
(2) override_function_with_return() which is used to override the probed
function returning and jump to its caller.

Here is a simple test under CONFIG_FUNCTION_ERROR_INJECTION and
CONFIG_FAIL_FUNCTION:

  # echo sys_clone > /sys/kernel/debug/fail_function/inject
  # echo 100 > /sys/kernel/debug/fail_function/probability
  # dmesg
  bash: fork: Invalid argument
  # dmesg
  ...
  FAULT_INJECTION: forcing a failure.
  name fail_function, interval 1, probability 100, space 0, times 1
  ...
  Call Trace:
  [<90000000002238f4>] show_stack+0x5c/0x180
  [<90000000012e384c>] dump_stack_lvl+0x60/0x88
  [<9000000000b1879c>] should_fail_ex+0x1b0/0x1f4
  [<900000000032ead4>] fei_kprobe_handler+0x28/0x6c
  [<9000000000230970>] kprobe_breakpoint_handler+0xf0/0x118
  [<90000000012e3e60>] do_bp+0x2c4/0x358
  [<9000000002241924>] exception_handlers+0x1924/0x10000
  [<900000000023b7d0>] sys_clone+0x0/0x4
  [<90000000012e4744>] do_syscall+0x7c/0x94
  [<9000000000221e44>] handle_syscall+0xc4/0x160

Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:52 +08:00
Bibo Mao
69e3a6aa6b LoongArch: Add checksum optimization for 64-bit system
LoongArch platform is 64-bit system, which supports 8-bytes memory
accessing, but generic checksum functions use 4-byte memory access.
So add 8-bytes memory access optimization for checksum functions on
LoongArch. And the code comes from arm64 system.

When network hw checksum is disabled, iperf performance improves about
10% with this patch.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:43 +08:00
Huacai Chen
a275a82dcd LoongArch: Use alternative to optimize libraries
Use the alternative to optimize common libraries according whether CPU
has UAL (hardware unaligned access support) feature, including memset(),
memcopy(), memmove(), copy_user() and clear_user().

We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):

1, One copy, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9566582.0    819.8
Double-Precision Whetstone                       55.0       2805.3    510.1
Execl Throughput                                 43.0       2120.0    493.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     209833.0    529.9
File Copy 256 bufsize 500 maxblocks            1655.0      89400.0    540.2
File Copy 4096 bufsize 8000 maxblocks          5800.0     320036.0    551.8
Pipe Throughput                               12440.0     340624.0    273.8
Pipe-based Context Switching                   4000.0     109939.1    274.8
Process Creation                                126.0       4728.7    375.3
Shell Scripts (1 concurrent)                     42.4       2223.1    524.3
Shell Scripts (8 concurrent)                      6.0        883.1   1471.9
System Call Overhead                          15000.0     518639.1    345.8
                                                                   ========
System Benchmarks Index Score                                         500.2

2, One copy, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9567674.7    819.9
Double-Precision Whetstone                       55.0       2805.5    510.1
Execl Throughput                                 43.0       2392.7    556.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     417804.0   1055.1
File Copy 256 bufsize 500 maxblocks            1655.0     112909.5    682.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1255207.4   2164.2
Pipe Throughput                               12440.0     555712.0    446.7
Pipe-based Context Switching                   4000.0      99964.5    249.9
Process Creation                                126.0       5192.5    412.1
Shell Scripts (1 concurrent)                     42.4       2302.4    543.0
Shell Scripts (8 concurrent)                      6.0        919.6   1532.6
System Call Overhead                          15000.0     511159.3    340.8
                                                                   ========
System Benchmarks Index Score                                         640.1

3, Four copies, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38268610.5   3279.2
Double-Precision Whetstone                       55.0      11222.2   2040.4
Execl Throughput                                 43.0       7892.0   1835.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     235149.6    593.8
File Copy 256 bufsize 500 maxblocks            1655.0      74959.6    452.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     545048.5    939.7
Pipe Throughput                               12440.0    1337359.0   1075.0
Pipe-based Context Switching                   4000.0     473663.9   1184.2
Process Creation                                126.0      17491.2   1388.2
Shell Scripts (1 concurrent)                     42.4       6865.7   1619.3
Shell Scripts (8 concurrent)                      6.0       1015.9   1693.1
System Call Overhead                          15000.0    1899535.2   1266.4
                                                                   ========
System Benchmarks Index Score                                        1278.3

4, Four copies, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38272815.5   3279.6
Double-Precision Whetstone                       55.0      11222.8   2040.5
Execl Throughput                                 43.0       8839.2   2055.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     313912.9    792.7
File Copy 256 bufsize 500 maxblocks            1655.0      80976.1    489.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    1176594.3   2028.6
Pipe Throughput                               12440.0    2100941.9   1688.9
Pipe-based Context Switching                   4000.0     476696.4   1191.7
Process Creation                                126.0      18394.7   1459.9
Shell Scripts (1 concurrent)                     42.4       7172.2   1691.6
Shell Scripts (8 concurrent)                      6.0       1058.3   1763.9
System Call Overhead                          15000.0    1874714.7   1249.8
                                                                   ========
System Benchmarks Index Score                                        1488.8

Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-14 08:36:11 +08:00
Huacai Chen
61a6fccc0b LoongArch: Add unaligned access support
Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support
unaligned access in hardware, while Loongson-3 series (Loongson-3A5000,
Loongson-3C5000) are configurable whether support unaligned access in
hardware. This patch add unaligned access emulation for those LoongArch
processors without hardware support.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-14 08:36:11 +08:00
Huacai Chen
fa96b57c14 LoongArch: Add build infrastructure
Add Kbuild, Makefile, Kconfig and link script for LoongArch build
infrastructure.

Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-03 20:09:27 +08:00