Commit Graph

1748 Commits

Author SHA1 Message Date
Deepak Gupta
9d42fc28fc riscv/traps: Introduce software check exception and uprobe handling
The Zicfiss and Zicfilp extensions introduce a new exception, the
'software check exception', in the privileged ISA, with cause code =
18. This patch implements support for software check exceptions.

Additionally, the patch implements a CFI violation handler which
checks the code in the xtval register. If xtval=2, the software check
exception happened because of an indirect branch that didn't land on a
4 byte aligned PC or on a 'lpad' instruction, or the label value
embedded in 'lpad' didn't match the label value set in the x7
register. If xtval=3, the software check exception happened due to a
mismatch between the link register (x1 or x5) and the top of shadow
stack (on execution of `sspopchk`).

In case of a CFI violation, SIGSEGV is raised with code=SEGV_CPERR.
SEGV_CPERR was introduced by the x86 shadow stack patches.

To keep uprobes working, handle the uprobe event first before
reporting the CFI violation in the software check exception
handler. This is because, when the landing pad is activated, if the
uprobe point is set at the lpad instruction at the beginning of a
function, the system triggers a software check exception instead of an
ebreak exception due to the exception priority.  This would prevent
uprobe from working.

Reviewed-by: Zong Li <zong.li@sifive.com>
Co-developed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-15-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up the patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-29 02:38:40 -07:00
Deepak Gupta
8a9e22d2ca riscv: Implement indirect branch tracking prctls
This patch adds a RISC-V implementation of the following prctls:
PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and
PR_LOCK_INDIR_BR_LP_STATUS.

Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de>
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-14-b55691eacf4f@rivosinc.com
[pjw@kernel.org: clean up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-29 02:38:40 -07:00
Deepak Gupta
61a0200211 riscv: Implement arch-agnostic shadow stack prctls
Implement an architecture-agnostic prctl() interface for setting and
getting shadow stack status.  The prctls implemented are
PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS and
PR_LOCK_SHADOW_STACK_STATUS.

As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only
PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to
write to their own shadow stack using 'sspush' or 'ssamoswap'.

PR_LOCK_SHADOW_STACK_STATUS locks the current shadow stack enablement
configuration.

Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-12-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-29 02:34:39 -07:00
Deepak Gupta
fd44a4a855 riscv/shstk: If needed allocate a new shadow stack on clone
Userspace specifies CLONE_VM to share address space and spawn new
thread.  'clone' allows userspace to specify a new stack for a new
thread. However there is no way to specify a new shadow stack base
address without changing the API. This patch allocates a new shadow
stack whenever CLONE_VM is given.

In case of CLONE_VFORK, the parent is suspended until the child
finishes; thus the child can use the parent's shadow stack. In case of
!CLONE_VM, COW kicks in because entire address space is copied from
parent to child.

'clone3' is extensible and can provide mechanisms for specifying the
shadow stack as an input parameter. This is not settled yet and is
being extensively discussed on the mailing list. Once that's settled,
this code should be adapted.

Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-11-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-29 02:34:21 -07:00
Han Gao
0ea05c4f75 riscv: compat: fix COMPAT_UTS_MACHINE definition
The COMPAT_UTS_MACHINE for riscv was incorrectly defined as "riscv".
Change it to "riscv32" to reflect the correct 32-bit compat name.

Fixes: 06d0e37236 ("riscv: compat: Add basic compat data type implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Reviewed-by: Guo Ren (Alibaba Damo Academy) <guoren@kernel.org>
Link: https://patch.msgid.link/20260127190711.2264664-1-gaohan@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-29 00:59:17 -07:00
Rohan McLure
d79f9c9cf7 mm: provide address parameter to p{te,md,ud}_user_accessible_page()
On several powerpc platforms, a page table entry may not imply whether the
relevant mapping is for userspace or kernelspace.  Instead, such platforms
infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

[ajd@linux.ibm.com: rebase on arm64 changes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-9-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:35 -08:00
Rohan McLure
d7b4b67eb6 mm/page_table_check: reinstate address parameter in [__]page_table_check_pte_clear()
This reverts commit aa232204c4 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pte_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.

[ajd@linux.ibm.com: rebase, fix additional occurrence and loop handling]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-8-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:35 -08:00
Rohan McLure
649ec9e3d0 mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd_clear()
This reverts commit 1831414cd7 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.

[ajd@linux.ibm.com: rebase on arm64 changes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-7-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:35 -08:00
Rohan McLure
2e6ac078ce mm/page_table_check: reinstate address parameter in [__]page_table_check_pud_clear()
This reverts commit 931c38e164 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.

[ajd@linux.ibm.com: rebase on arm64 changes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-6-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:34 -08:00
Rohan McLure
0a5ae44831 mm/page_table_check: provide addr parameter to page_table_check_ptes_set()
To provide support for powerpc platforms, provide an addr parameter to the
__page_table_check_ptes_set() and page_table_check_ptes_set() routines. 
This parameter is needed on some powerpc platforms which do not encode
whether a mapping is for user or kernel in the pte.  On such platforms,
this can be inferred from the addr parameter.

[ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-5-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:34 -08:00
Rohan McLure
6e2d8f9fc4 mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd[s]_set()
This reverts commit a3b837130b ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_set").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Apply this to __page_table_check_pmds_set(), page_table_check_pmd_set(), and
the page_table_check_pmd_set() wrapper macro.

[ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-4-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:34 -08:00
Rohan McLure
c4a0c5ff85 mm/page_table_check: reinstate address parameter in [__]page_table_check_pud[s]_set()
This reverts commit 6d144436d9 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_set").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the page
in the pte, but instead in the address of the access.

Apply this to __page_table_check_puds_set(), page_table_check_puds_set()
and the page_table_check_pud_set() wrapper macro.

[ajd@linux.ibm.com: rebase on riscv + arm64 changes, update commit message]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-3-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Ingo Molnar <mingo@kernel.org>  # x86
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:33 -08:00
Deepak Gupta
540de7ade1 riscv/mm: update write protect to work on shadow stacks
'fork' implements copy-on-write (COW) by making pages readonly in both
child and parent.

ptep_set_wrprotect() and pte_wrprotect() clear _PAGE_WRITE in PTE.
The assumption is that the page is readable and, on a fault,
copy-on-write happens.

To implement COW on shadow stack pages, clearing the W bit makes them
XWR = 000. This will result in the wrong PTE setting, which allows no
permissions, but with V=1 and the PFN field pointing to the final
page. Instead, the desired behavior is to turn it into a readable
page, take an access (load/store) fault on sspush/sspop (shadow stack)
and then perform COW on such pages. This way regular reads would still
be allowed and not lead to COW maintaining current behavior of COW on
non-shadow stack but writeable memory.

On the other hand, this doesn't interfere with existing COW for
read-write memory.  The assumption is always that _PAGE_READ must have
been set, and thus, setting _PAGE_READ is harmless.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-9-b55691eacf4f@rivosinc.com
[pjw@kernel.org: clarify patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:54 -07:00
Deepak Gupta
c68c2ef9d6 riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs
pte_mkwrite() creates PTEs with WRITE encodings for the underlying
architecture.  The underlying architecture can have two types of
writeable mappings: one that can be written using regular store
instructions, and another one that can only be written using
specialized store instructions (like shadow stack stores).
pte_mkwrite can select write PTE encoding based on VMA range (i.e.
VM_SHADOW_STACK)

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-8-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Deepak Gupta
f56ffb8ada riscv/mm: manufacture shadow stack ptes
This patch implements the creation of a shadow stack pte on
riscv. Creating shadow stack PTE on riscv means clearing RWX and then
setting W=1.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-7-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Deepak Gupta
6c7559f22b riscv/mm: ensure PROT_WRITE leads to VM_READ | VM_WRITE
'arch_calc_vm_prot_bits' is implemented on risc-v to return VM_READ |
VM_WRITE if PROT_WRITE is specified. Similarly 'riscv_sys_mmap' is
updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ).
This is to make sure that any existing apps using PROT_WRITE still work.

Earlier 'protection_map[VM_WRITE]' used to pick read-write PTE encodings.
Now 'protection_map[VM_WRITE]' will always pick PAGE_SHADOWSTACK PTE
encodings for shadow stack. The above changes ensure that existing apps
continue to work because underneath, the kernel will be picking
'protection_map[VM_WRITE|VM_READ]' PTE encodings.

Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-6-b55691eacf4f@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Deepak Gupta
79dd4f2f40 riscv: Add usercfi state for task and save/restore of CSR_SSP on trap entry/exit
Carve out space in the RISC-V architecture-specific thread struct for
cfi status and shadow stack in usermode.

This patch:
- defines a new structure cfi_status with status bit for cfi feature
- defines shadow stack pointer, base and size in cfi_status structure
- defines offsets to new member fields in thread in asm-offsets.c
- saves and restores shadow stack pointer on trap entry (U --> S) and exit
  (S --> U)

Shadow stack save/restore is gated on feature availability and is
implemented using alternatives. CSR_SSP can be context-switched in
'switch_to' as well, but as soon as kernel shadow stack support gets
rolled in, the shadow stack pointer will need to be switched at trap
entry/exit point (much like 'sp'). It can be argued that a kernel
using a shadow stack deployment scenario may not be as prevalent as
user mode using this feature. But even if there is some minimal
deployment of kernel shadow stack, that means that it needs to be
supported.  Thus save/restore of shadow stack pointer is implemented
in entry.S instead of in 'switch_to.h'.

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-5-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Deepak Gupta
41a2452c99 riscv: add Zicfiss / Zicfilp extension CSR and bit definitions
The Zicfiss and Zicfilp extensions are enabled via b3 and b2 in
*envcfg CSRs.  menvcfg controls enabling for S/HS mode.  henvcfg
controls enabling for VS.  senvcfg controls enabling for U/VU mode.

The Zicfilp extension extends *status CSRs to hold an 'expected
landing pad' bit.  A trap or interrupt can occur between an indirect
jmp/call and target instruction.  The 'expected landing pad' bit from
the CPU is recorded into the xstatus CSR so that when the supervisor
performs xret, the 'expected landing pad' state of the CPU can be
restored.

Zicfiss adds one new CSR, CSR_SSP, which contains the current shadow
stack pointer.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-4-b55691eacf4f@rivosinc.com
[pjw@kernel.org: grouped CSR_SSP macro with the other CSR macros; clarified patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Deepak Gupta
df11708566 riscv: zicfiss / zicfilp enumeration
This patch adds support for detecting the RISC-V ISA extensions
Zicfiss and Zicfilp.  Zicfiss and Zicfilp stand for the unprivileged
integer spec extensions for shadow stack and indirect branch tracking,
respectively.

This patch looks for Zicfiss and Zicfilp in the device tree and
accordingly lights up the corresponding bits in the cpu feature
bitmap. Furthermore this patch adds detection utility functions to
return whether shadow stack or landing pads are supported by the cpu.

Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-3-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:53 -07:00
Florian Schmaus
9f77b4c5c3 riscv: mm: define copy_user_page() as copy_page()
Currently, the implementation of copy_user_page() is identical to
copy_page().

Align riscv with other architectures (alpha, arc, arm64, hexagon,
longarch, m68k, openrisc, s390, um, xtensa) and map copy_user_page()
to copy_page() given that their implementation is identical.

In addition to following a common pattern, this centralizes the
implementation. Any changes to the underlying page copy logic (e.g.,
for CHERI) will now automatically propagate to copy_user_page().

Signed-off-by: Florian Schmaus <florian.schmaus@codasip.com>
Link: https://patch.msgid.link/20260113134025.905627-1-florian.schmaus@codasip.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:52 -07:00
Austin Kim
494d4a051c riscv: fix minor typo in syscall.h comment
Some developers may be confused because RISC-V does not have
a register named r0. Also, orig_r0 is not available in pt_regs structure,
which is specific to riscv. So we had better fix this minor typo.

Signed-off-by: Austin Kim <austin.kim@lge.com>
Link: https://patch.msgid.link/aW3Z4zTBvGJpk7a7@adminpc-PowerEdge-R7525
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:08:59 -07:00
Nathan Chancellor
841e47d56c riscv: Add intermediate cast to 'unsigned long' in __get_user_asm
After commit bdce162f2e ("riscv: Use 64-bit variable for output in
__get_user_asm"), there is a warning when building for 32-bit RISC-V:

  In file included from include/linux/uaccess.h:13,
                   from include/linux/sched/task.h:13,
                   from include/linux/sched/signal.h:9,
                   from include/linux/rcuwait.h:6,
                   from include/linux/mm.h:36,
                   from include/linux/migrate.h:5,
                   from mm/migrate.c:16:
  mm/migrate.c: In function 'do_pages_move':
  arch/riscv/include/asm/uaccess.h:115:15: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    115 |         (x) = (__typeof__(x))__tmp;                             \
        |               ^
  arch/riscv/include/asm/uaccess.h:198:17: note: in expansion of macro '__get_user_asm'
    198 |                 __get_user_asm("lb", (x), __gu_ptr, label);     \
        |                 ^~~~~~~~~~~~~~
  arch/riscv/include/asm/uaccess.h:218:9: note: in expansion of macro '__get_user_nocheck'
    218 |         __get_user_nocheck(x, ptr, __gu_failed);                        \
        |         ^~~~~~~~~~~~~~~~~~
  arch/riscv/include/asm/uaccess.h:255:9: note: in expansion of macro '__get_user_error'
    255 |         __get_user_error(__gu_val, __gu_ptr, __gu_err);         \
        |         ^~~~~~~~~~~~~~~~
  arch/riscv/include/asm/uaccess.h:285:17: note: in expansion of macro '__get_user'
    285 |                 __get_user((x), __p) :                          \
        |                 ^~~~~~~~~~
  mm/migrate.c:2358:29: note: in expansion of macro 'get_user'
   2358 |                         if (get_user(p, pages + i))
        |                             ^~~~~~~~

Add an intermediate cast to 'unsigned long', which is guaranteed to be the same
width as a pointer, before the cast to the type of the output variable to clear
up the warning.

Fixes: bdce162f2e ("riscv: Use 64-bit variable for output in __get_user_asm")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601210526.OT45dlOZ-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260121-riscv-fix-int-to-pointer-cast-v1-1-b83eebe57c76@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-22 20:21:14 -07:00
David Hildenbrand
8e38607aa4 treewide: provide a generic clear_user_page() variant
Patch series "mm: folio_zero_user: clear page ranges", v11.

This series adds clearing of contiguous page ranges for hugepages.

The series improves on the current discontiguous clearing approach in two
ways:

  - clear pages in a contiguous fashion.
  - use batched clearing via clear_pages() wherever exposed.

The first is useful because it allows us to make much better use of
hardware prefetchers.

The second, enables advertising the real extent to the processor.  Where
specific instructions support it (ex.  string instructions on x86; "mops"
on arm64 etc), a processor can optimize based on this because, instead of
seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a
larger unit being operated on.

For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to
a mode where they start eliding cacheline allocation.  This is helpful not
just because it results in higher bandwidth, but also because now the
cache is not evicting useful cachelines and replacing them with zeroes.

Demand faulting a 64GB region shows performance improvement:

 $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5

                       baseline              +series
                   (GBps +- %stdev)      (GBps +- %stdev)

   pg-sz=2MB       11.76 +- 1.10%        25.34 +- 1.18% [*]   +115.47%  	preempt=*

   pg-sz=1GB       24.85 +- 2.41%        39.22 +- 2.32%       + 57.82%  	preempt=none|voluntary
   pg-sz=1GB         (similar)           52.73 +- 0.20% [#]   +112.19%  	preempt=full|lazy

 [*] This improvement is because switching to sequential clearing
  allows the hardware prefetchers to do a much better job.

 [#] For pg-sz=1GB a large part of the improvement is because of the
  cacheline elision mentioned above. preempt=full|lazy improves upon
  that because, not needing explicit invocations of cond_resched() to
  ensure reasonable preemption latency, it can clear the full extent
  as a single unit. In comparison the maximum extent used for
  preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB).

  When provided the full extent the processor forgoes allocating
  cachelines on this path almost entirely.

  (The hope is that eventually, in the fullness of time, the lazy
   preemption model will be able to do the same job that none or
   voluntary models are used for, allowing us to do away with
   cond_resched().)

Raghavendra also tested previous version of the series on AMD Genoa and
sees similar improvement [1] with preempt=lazy.

  $ perf bench mem map -p $page-size -f populate -s 64GB -l 10

                    base               patched              change
   pg-sz=2MB       12.731939 GB/sec    26.304263 GB/sec     106.6%
   pg-sz=1GB       26.232423 GB/sec    61.174836 GB/sec     133.2%


This patch (of 8):

Let's drop all variants that effectively map to clear_page() and provide
it in a generic variant instead.

We'll use the macro clear_user_page to indicate whether an architecture
provides it's own variant.

Also, clear_user_page() is only called from the generic variant of
clear_user_highpage(), so define it only if the architecture does not
provide a clear_user_highpage().  And, for simplicity define it in
linux/highmem.h.

Note that for parisc, clear_page() and clear_user_page() map to
clear_page_asm(), so we can just get rid of the custom clear_user_page()
implementation.  There is a clear_user_page_asm() function on parisc, that
seems to be unused.  Not sure what's up with that.

Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com
Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:39 -08:00
Nathan Chancellor
bdce162f2e riscv: Use 64-bit variable for output in __get_user_asm
After commit f6bff7827a ("riscv: uaccess: use 'asm_goto_output' for
get_user()"), which was the first commit that started using asm goto
with outputs on RISC-V, builds of clang built with assertions enabled
start crashing in certain files that use get_user() with:

  clang: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:12743: Register FollowCopyChain(MachineRegisterInfo &, Register): Assertion `MI->getOpcode() == TargetOpcode::COPY && "start of copy chain MUST be COPY"' failed.

Internally, LLVM generates an addiw instruction when the output of the
inline asm (which may be any scalar type) needs to be sign extended for
ABI reasons, such as a later function call, so that basic block does not
have to do it.

Use a temporary 64-bit variable as the output of the inline assembly in
__get_user_asm() and explicitly cast it to truncate it if necessary,
avoiding the addiw that triggers the assertion.

Link: https://github.com/ClangBuiltLinux/linux/issues/2092
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260116-riscv-wa-llvm-asm-goto-outputs-assertion-failure-v3-1-55b5775f989b@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-16 18:05:47 -07:00
Juergen Gross
ee9ffcf99f riscv/paravirt: Use common code for paravirt_steal_clock()
Remove the arch specific variant of paravirt_steal_clock() and use
the common one instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-11-jgross@suse.com
2026-01-12 16:47:33 +01:00
Juergen Gross
e6b2aa6d40 sched: Move clock related paravirt code to kernel/sched
Paravirt clock related functions are available in multiple archs.

In order to share the common parts, move the common static keys
to kernel/sched/ and remove them from the arch specific files.

Make a common paravirt_steal_clock() implementation available in
kernel/sched/cputime.c, guarding it with a new config option
CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch
in case it wants to use that common variant.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
2026-01-12 15:39:14 +01:00
Juergen Gross
68b10fd40d paravirt: Remove asm/paravirt_api_clock.h
All architectures supporting CONFIG_PARAVIRT share the same contents
of asm/paravirt_api_clock.h:

  #include <asm/paravirt.h>

So remove all incarnations of asm/paravirt_api_clock.h and remove the
only place where it is included, as there asm/paravirt.h is included
anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc, scheduler bits
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-6-jgross@suse.com
2026-01-12 15:34:33 +01:00
Yunhui Cui
957afeb99b riscv: remove irqflags.h inclusion in asm/bitops.h
The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h

The compilation log is as follows:
CC      kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
               from ./include/linux/cpumask.h:12,
               from ./arch/riscv/include/asm/processor.h:55,
               from ./arch/riscv/include/asm/thread_info.h:42,
               from ./include/linux/thread_info.h:60,
               from ./include/asm-generic/preempt.h:5,
               from ./arch/riscv/include/generated/asm/preempt.h:1,
               from ./include/linux/preempt.h:79,
               from ./arch/riscv/include/asm/percpu.h:8,
               from ./include/linux/irqflags.h:19,
               from ./arch/riscv/include/asm/bitops.h:14,
               from ./include/linux/bitops.h:68,
               from ./include/linux/log2.h:12,
               from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
   66 |                 return val ? __ffs(val) : size;
      |                              ^~~~~

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251216014721.42262-2-cuiyunhui@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-07 13:16:38 -07:00
Guo Ren (Alibaba DAMO Academy)
5e5be092ff riscv: pgtable: Cleanup useless VA_USER_XXX definitions
These marcos are not used after commit b5b4287acc ("riscv: mm: Use
hint address in mmap if available"). Cleanup VA_USER_XXX definitions
in asm/pgtable.h.

Fixes: b5b4287acc ("riscv: mm: Use hint address in mmap if available")
Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20251201005850.702569-1-guoren@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-05 17:49:18 -07:00
Himanshu Chauhan
5efaf92da4 riscv: Add SBI debug trigger extension and function ids
Debug trigger extension is an SBI extension to support native debugging
in S-mode and VS-mode. This patch adds the extension and the function
IDs defined by the extension.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Link: https://patch.msgid.link/20250710125231.653967-2-hchauhan@ventanamicro.com
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:22:30 -07:00
Zongmin Zhou
f02dd25472 riscv/atomic.h: use RISCV_FULL_BARRIER in _arch_atomic* function.
Replace the same code with the pre-defined macro
RISCV_FULL_BARRIER to simplify the code.

Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Link: https://patch.msgid.link/20251120095831.64211-1-min_halo@163.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:22:30 -07:00
Pincheng Wang
3f0cbfb8a1 riscv: add ISA extension parsing for Zilsd and Zclsd
Add parsing for Zilsd and Zclsd ISA extensions which were ratified in
commit f88abf1 ("Integrating load/store pair for RV32 with the
main manual") of the riscv-isa-manual.

Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://patch.msgid.link/20250826162939.1494021-3-pincheng.plct@isrc.iscas.ac.cn
[pjw@kernel.org: cleaned up checkpatch issues, whitespace; updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:18:34 -07:00
Paul Walmsley
e0e51a0de0 riscv: mm: use xchg() on non-atomic_long_t variables, not atomic_long_xchg()
Let's not call atomic_long_xchg() on something that's not an
atomic_long_t, and just use xchg() instead.  Continues the cleanup
from commit 546e42c8c6 ("riscv: Use an atomic xchg in
pudp_huge_get_and_clear()"),

Cc: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:18:33 -07:00
Paul Walmsley
425cc087fb riscv: mm: ptep_get_and_clear(): avoid atomic ops when !CONFIG_SMP
When !CONFIG_SMP, there's no need for atomic operations in
ptep_get_and_clear(), so, similar to x86, let's not use atomics in
this case.

Cc: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:18:33 -07:00
Paul Walmsley
1e6084d5c4 riscv: mm: pmdp_huge_get_and_clear(): avoid atomic ops when !CONFIG_SMP
When !CONFIG_SMP, there's no need for atomic operations in
pmdp_huge_get_and_clear(), so, similar to what x86 does, let's not use
atomics in this case.  See also commit 546e42c8c6 ("riscv: Use an
atomic xchg in pudp_huge_get_and_clear()").

Cc: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:18:33 -07:00
Andy Chiu
818d78ba1b riscv: signal: abstract header saving for setup_sigcontext
The function save_v_state() served two purposes. First, it saved
extension context into the signal stack. Then, it constructed the
extension header if there was no fault. The second part is independent
of the extension itself. As a result, we can pull that part out, so
future extensions may reuse it. This patch adds arch_ext_list and makes
setup_sigcontext() go through all possible extensions' save() callback.
The callback returns a positive value indicating the size of the
successfully saved extension. Then the kernel proceeds to construct the
header for that extension. The kernel skips an extension if it does
not exist, or if the saving fails for some reasons. The error code is
propagated out on the later case.

This patch does not introduce any functional changes.

Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-16-b55691eacf4f@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-12-19 00:18:33 -07:00
Linus Torvalds
51d90a15fe ARM:
- Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.
 
 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.
 
 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.
 
 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.
 
 - Minor fixes to KVM and selftests
 
 Loongarch:
 
 - Get VM PMU capability from HW GCFG register.
 
 - Add AVEC basic support.
 
 - Use 64-bit register definition for EIOINTC.
 
 - Add KVM timer test cases for tools/selftests.
 
 RISC/V:
 
 - SBI message passing (MPXY) support for KVM guest
 
 - Give a new, more specific error subcode for the case when in-kernel
   AIA virtualization fails to allocate IMSIC VS-file
 
 - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
   in small chunks
 
 - Fix guest page fault within HLV* instructions
 
 - Flush VS-stage TLB after VCPU migration for Andes cores
 
 s390:
 
 - Always allocate ESCA (Extended System Control Area), instead of
   starting with the basic SCA and converting to ESCA with the
   addition of the 65th vCPU.  The price is increased number of
   exits (and worse performance) on z10 and earlier processor;
   ESCA was introduced by z114/z196 in 2010.
 
 - VIRT_XFER_TO_GUEST_WORK support
 
 - Operation exception forwarding support
 
 - Cleanups
 
 x86:
 
 - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
   caching is disabled, as there can't be any relevant SPTEs to zap.
 
 - Relocate a misplaced export.
 
 - Fix an async #PF bug where KVM would clear the completion queue when the
   guest transitioned in and out of paging mode, e.g. when handling an SMI and
   then returning to paged mode via RSM.
 
 - Leave KVM's user-return notifier registered even when disabling
   virtualization, as long as kvm.ko is loaded.  On reboot/shutdown, keeping
   the notifier registered is ok; the kernel does not use the MSRs and the
   callback will run cleanly and restore host MSRs if the CPU manages to
   return to userspace before the system goes down.
 
 - Use the checked version of {get,put}_user().
 
 - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
   timers can result in a hard lockup in the host.
 
 - Revert the periodic kvmclock sync logic now that KVM doesn't use a
   clocksource that's subject to NTP corrections.
 
 - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
   behind CONFIG_CPU_MITIGATIONS.
 
 - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path;
   the only reason they were handled in the fast path was to paper of a bug
   in the core #MC code, and that has long since been fixed.
 
 - Add emulator support for AVX MOV instructions, to play nice with emulated
   devices whose guest drivers like to access PCI BARs with large multi-byte
   instructions.
 
 x86 (AMD):
 
 - Fix a few missing "VMCB dirty" bugs.
 
 - Fix the worst of KVM's lack of EFER.LMSLE emulation.
 
 - Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
 
 - Fix incorrect handling of selective CR0 writes when checking intercepts
   during emulation of L2 instructions.
 
 - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
   VMRUN and #VMEXIT.
 
 - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
   interrupt if the guest patched the underlying code after the VM-Exit, e.g.
   when Linux patches code with a temporary INT3.
 
 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
   userspace, and extend KVM "support" to all policy bits that don't require
   any actual support from KVM.
 
 x86 (Intel):
 
 - Use the root role from kvm_mmu_page to construct EPTPs instead of the
   current vCPU state, partly as worthwhile cleanup, but mostly to pave the
   way for tracking per-root TLB flushes, and elide EPT flushes on pCPU
   migration if the root is clean from a previous flush.
 
 - Add a few missing nested consistency checks.
 
 - Rip out support for doing "early" consistency checks via hardware as the
   functionality hasn't been used in years and is no longer useful in general;
   replace it with an off-by-default module param to WARN if hardware fails
   a check that KVM does not perform.
 
 - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
   on VM-Enter.
 
 - Misc cleanups.
 
 - Overhaul the TDX code to address systemic races where KVM (acting on behalf
   of userspace) could inadvertantly trigger lock contention in the TDX-Module;
   KVM was either working around these in weird, ugly ways, or was simply
   oblivious to them (though even Yan's devilish selftests could only break
   individual VMs, not the host kernel)
 
 - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU,
   if creating said vCPU failed partway through.
 
 - Fix a few sparse warnings (bad annotation, 0 != NULL).
 
 - Use struct_size() to simplify copying TDX capabilities to userspace.
 
 - Fix a bug where TDX would effectively corrupt user-return MSR values if the
   TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
 
 Selftests:
 
 - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
 
 - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
   ARCH=x86_64 on the command line.
 
 - Extend a bunch of nested VMX to validate nested SVM as well.
 
 - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
   verify KVM can save/restore nested VMX state when L1 is using 5-level
   paging, but L2 is not.
 
 - Clean up the guest paging code in anticipation of sharing the core logic for
   nested EPT and nested NPT.
 
 guest_memfd:
 
 - Add NUMA mempolicy support for guest_memfd, and clean up a variety of
   rough edges in guest_memfd along the way.
 
 - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
   from a memslot to make it harder to leak references.
 
 - Enhance KVM selftests to make it easer to develop and debug selftests like
   those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
   often result in hard-to-debug SIGBUS errors.
 
 - Misc cleanups.
 
 Generic:
 
 - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
   irqfd cleanup.
 
 - Fix a goof in the dirty ring documentation.
 
 - Fix choice of target for directed yield across different calls to
   kvm_vcpu_on_spin(); the function was always starting from the first
   vCPU instead of continuing the round-robin search.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkvMa8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMlFwf+Ow7zOYUuELSQ+Jn+hOYXiCNrdBDx
 ZamvMU8kLPr7XX0Zog6HgcMm//qyA6k5nSfqCjfsQZrIhRA/gWJ61jz1OX/Jxq18
 pJ9Vz6epnEPYiOtBwz+v8OS8MqDqVNzj2i6W1/cLPQE50c1Hhw64HWS5CSxDQiHW
 A7PVfl5YU12lW1vG3uE0sNESDt4Eh/spNM17iddXdF4ZUOGublserjDGjbc17E7H
 8BX3DkC2plqkJKwtjg0ae62hREkITZZc7RqsnftUkEhn0N0H9+rb6NKUyzIVh9NZ
 bCtCjtrKN9zfZ0Mujnms3ugBOVqNIputu/DtPnnFKXtXWSrHrgGSNv5ewA==
 =PEcw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...
2025-12-05 17:01:20 -08:00
Linus Torvalds
07025b51c1 First set of RISC-V updates for v6.19-rc1
- Enable parallel hotplug for RISC-V
 
 - Optimize vector regset allocation for ptrace()
 
 - Add a kernel selftest for the vector ptrace interface
 
 - Enable the userspace RAID6 test to build and run using RISC-V
   vectors
 
 - Add initial support for the Zalasr RISC-V ratified ISA extension
 
 - For the Zicbop RISC-V ratified ISA extension to userspace, expose
   hardware and kernel support to userspace and add a kselftest for
   Zicbop
 
 - Convert open-coded instances of 'asm goto's that are controlled by
   runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
   following arm64's alternative_has_cap_{un,}likely()
 
 - Remove an unnecessary mask in the GFP flags used in some calls to
   pagetable_alloc()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmkx2zkACgkQx4+xDQu9
 KkvagQ/+Iv94J0S8VWS4vvRMtvPMkEHvPADCmEOrMrVsQ+hZdasKsSFBKY4DHZt6
 baGKEQoyJ0NDrIt51uNWzmR503/PGPwVYSDfrgrpS87IkcO9OWe/HiMSEAu0aCyp
 dhKGYnjWUtawWzFQg1GQ2n1vLOm5cQ2u1vTptISqU1yg9XlXUN0LNzIqNB8GpQv3
 Hm7qqNFyxZOyAC9/ZFGbX2/0KpKQh1xkXSEONxzRGADJ/bqHKKz3hdaOj3aVcoMM
 zObvHBm+I0U6AnGSgiZm71dvO0vlijg1RsuD/wd1DlcGO9QAuaPHX+RKqmJUJFe8
 d0JjOon+d6n/pBKxoPnZyfB1IHxFNb3kX3LjmKdP6NYeOmdHZ7LI3dzB+LFyPXZe
 mkC948+9GExEqmHQx5ZqyCWDwKtknIQPA45ZjBi8e5YOU8nJQiapShYWQu/6ybKV
 GFHRqjDIGfhpjZGVdxnUX2Iok1XcwaGKrI6/6P6WlQ/zHKGurtEEtGiI7XHDYS8m
 FSGxYay4GIWUsNbNSBWcZp5QaGH0jW7qiZle3DR+8gDLe1DJzbIo6pWMiArm5v7x
 fUmroR4FcF9x2X7A01IB2tEUGf/0nfuHgVfMNPNoHBGsiRs9mXzLMngJjbFXOGyF
 EP61/W+K+eaJ0jjkfmnscBQEy8URLSNoACnwQLT9SQcAIC4xWTs=
 =J1nv
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:

 - Enable parallel hotplug for RISC-V

 - Optimize vector regset allocation for ptrace()

 - Add a kernel selftest for the vector ptrace interface

 - Enable the userspace RAID6 test to build and run using RISC-V vectors

 - Add initial support for the Zalasr RISC-V ratified ISA extension

 - For the Zicbop RISC-V ratified ISA extension to userspace, expose
   hardware and kernel support to userspace and add a kselftest for
   Zicbop

 - Convert open-coded instances of 'asm goto's that are controlled by
   runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
   following arm64's alternative_has_cap_{un,}likely()

 - Remove an unnecessary mask in the GFP flags used in some calls to
   pagetable_alloc()

* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  selftests/riscv: Add Zicbop prefetch test
  riscv: hwprobe: Expose Zicbop extension and its block size
  riscv: Introduce Zalasr instructions
  riscv: hwprobe: Export Zalasr extension
  dt-bindings: riscv: Add Zalasr ISA extension description
  riscv: Add ISA extension parsing for Zalasr
  selftests: riscv: Add test for the Vector ptrace interface
  riscv: ptrace: Optimize the allocation of vector regset
  raid6: test: Add support for RISC-V
  raid6: riscv: Allow code to be compiled in userspace
  raid6: riscv: Prevent compiler from breaking inline vector assembly code
  riscv: cmpxchg: Use riscv_has_extension_likely
  riscv: bitops: Use riscv_has_extension_likely
  riscv: hweight: Use riscv_has_extension_likely
  riscv: checksum: Use riscv_has_extension_likely
  riscv: pgtable: Use riscv_has_extension_unlikely
  riscv: Remove __GFP_HIGHMEM masking
  RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
2025-12-05 16:26:57 -08:00
Linus Torvalds
7203ca412f Significant patch series in this merge are as follows:
- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
   Uladzislau Rezki reworks the vmalloc() code to support non-blocking
   allocations (GFP_ATOIC, GFP_NOWAIT).
 
 - The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
   a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
   across fork/exec.
 
 - The 4 patch series "mm/zswap: misc cleanup of code and documentations"
   from SeongJae Park does some light maintenance work on the zswap code.
 
 - The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
   and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
   /sys/kernel/debug/page_owner debug feature.  It adds unique identifiers
   to differentiate the various stack traces so that userspace monitoring
   tools can better match stack traces over time.
 
 - The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
   Hahn makes some minor alterations to the page allocator's per-cpu-pages
   feature.
 
 - The 2 patch series "Improve UFFDIO_MOVE scalability by removing
   anon_vma lock" from Lokesh Gidra addresses a scalability issue in
   userfaultfd's UFFDIO_MOVE operation.
 
 - The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
   Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
 
 - The 2 patch series "drivers/base/node: fold node register and
   unregister functions" from Donet Tom cleans up the NUMA node handling
   code a little.
 
 - The 4 patch series "mm: some optimizations for prot numa" from Kefeng
   Wang provides some cleanups and small optimizations to the NUMA
   allocation hinting code.
 
 - The 5 patch series "mm/page_alloc: Batch callers of
   free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
   boot on large machines.  These were causing (harmless) softlockup
   warnings.
 
 - The 2 patch series "optimize the logic for handling dirty file folios
   during reclaim" from Baolin Wang removes some now-unnecessary work from
   page reclaim.
 
 - The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
   per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
   feature.
 
 - The 2 patch series "mm/damon: fixes for address alignment issues in
   DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
   and DAMON_RECLAIM with certain userspace configuration.
 
 - The 15 patch series "expand mmap_prepare functionality, port more
   users" from Lorenzo Stoakes enhances the new(ish)
   file_operations.mmap_prepare() method and ports additional callsites
   from the old ->mmap() over to ->mmap_prepare().
 
 - The 8 patch series "Fix stale IOTLB entries for kernel address space"
   from Lu Baolu fixes a bug (and possible security issue on non-x86) in
   the IOMMU code.  In some situations the IOMMU could be left hanging onto
   a stale kernel pagetable entry.
 
 - The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
   from Wei Yang cleans up and optimizes the folio splitting code.
 
 - The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
   Song implements some cleanups and a minor fix in the swap discard code.
 
 - The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
   Park does as advertised.
 
 - The 9 patch series "mm/damon: support pin-point targets removal" from
   SeongJae Park permits userspace to remove a specific monitoring target
   in the middle of the current targets list.
 
 - The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
   from Harry Yoo implements a couple of cleanups related to mm header file
   inclusion.
 
 - The 2 patch series "mm/swapfile.c: select swap devices of default
   priority round robin" from Baoquan He improves the selection of swap
   devices for NUMA machines.
 
 - The 3 patch series "mm: Convert memory block states (MEM_*) macros to
   enums" from Israel Batista changes the memory block labels from macros
   to enums so they will appear in kernel debug info.
 
 - The 3 patch series "ksm: perform a range-walk to jump over holes in
   break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
   unmerges an address range.
 
 - The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
   from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
   userspace unit tests.
 
 - The 2 patch series "some cleanups for pageout()" from Baolin Wang
   cleans up a couple of minor things in the page scanner's
   writeback-for-eviction code.
 
 - The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
   Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
 
 - The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
   Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
   and improves the mergeability of guarded VMAs.
 
 - The 2 patch series "mm: perform guard region install/remove under VMA
   lock" from Lorenzo Stoakes reduces mmap lock contention for callers
   performing VMA guard region operations.
 
 - The 2 patch series "vma_start_write_killable" from Matthew Wilcox
   starts work in permitting applications to be killed when they are
   waiting on a read_lock on the VMA lock.
 
 - The 11 patch series "mm/damon/tests: add more tests for online
   parameters commit" from SeongJae Park adds additional userspace testing
   of DAMON's "commit" feature.
 
 - The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
   that.
 
 - The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
   Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
   that VMA is merged with another.
 
 - The 16 patch series "mm: support device-private THP" from Balbir Singh
   introduces support for Transparent Huge Page (THP) migration in zone
   device-private memory.
 
 - The 3 patch series "Optimize folio split in memory failure" from Zi
   Yan optimizes folio split operations in the memory failure code.
 
 - The 2 patch series "mm/huge_memory: Define split_type and consolidate
   split support checks" from Wei Yang provides some more cleanups in the
   folio splitting code.
 
 - The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
   entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
   handling of pagetable leaf entries by introducing the concept of
   'software leaf entries', of type softleaf_t.
 
 - The 4 patch series "reparent the THP split queue" from Muchun Song
   reparents the THP split queue to its parent memcg.  This is in
   preparation for addressing the long-standing "dying memcg" problem,
   wherein dead memcg's linger for too long, consuming memory resources.
 
 - The 3 patch series "unify PMD scan results and remove redundant
   cleanup" from Wei Yang does a little cleanup in the hugepage collapse
   code.
 
 - The 6 patch series "zram: introduce writeback bio batching" from
   Sergey Senozhatsky improves zram writeback efficiency by introducing
   batched bio writeback support.
 
 - The 4 patch series "memcg: cleanup the memcg stats interfaces" from
   Shakeel Butt cleans up our handling of the interrupt safety of some
   memcg stats.
 
 - The 4 patch series "make vmalloc gfp flags usage more apparent" from
   Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
 
 - The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
   from Chunyan Zhang teches soft dirty and userfaultfd write protect
   tracking to use RISC-V's Svrsw60t59b extension.
 
 - The 5 patch series "mm: swap: small fixes and comment cleanups" from
   Youngjun Park fixes a small bug and cleans up some of the swap code.
 
 - The 4 patch series "initial work on making VMA flags a bitmap" from
   Lorenzo Stoakes starts work on converting the vma struct's flags to a
   bitmap, so we stop running out of them, especially on 32-bit.
 
 - The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
   from Youngjun Park addresses a possible bug in the swap discard code and
   cleans things up a little.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
 jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
 o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
 =IUzS
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

  "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
     Rework the vmalloc() code to support non-blocking allocations
     (GFP_ATOIC, GFP_NOWAIT)

  "ksm: fix exec/fork inheritance" (xu xin)
     Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
     inherited across fork/exec

  "mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
     Some light maintenance work on the zswap code

  "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
     Enhance the /sys/kernel/debug/page_owner debug feature by adding
     unique identifiers to differentiate the various stack traces so
     that userspace monitoring tools can better match stack traces over
     time

  "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
     Minor alterations to the page allocator's per-cpu-pages feature

  "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
     Address a scalability issue in userfaultfd's UFFDIO_MOVE operation

  "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)

  "drivers/base/node: fold node register and unregister functions" (Donet Tom)
     Clean up the NUMA node handling code a little

  "mm: some optimizations for prot numa" (Kefeng Wang)
     Cleanups and small optimizations to the NUMA allocation hinting
     code

  "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
     Address long lock hold times at boot on large machines. These were
     causing (harmless) softlockup warnings

  "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
     Remove some now-unnecessary work from page reclaim

  "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
     Enhance the DAMOS auto-tuning feature

  "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
     Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
     configuration

  "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
     Enhance the new(ish) file_operations.mmap_prepare() method and port
     additional callsites from the old ->mmap() over to ->mmap_prepare()

  "Fix stale IOTLB entries for kernel address space" (Lu Baolu)
     Fix a bug (and possible security issue on non-x86) in the IOMMU
     code. In some situations the IOMMU could be left hanging onto a
     stale kernel pagetable entry

  "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
     Clean up and optimize the folio splitting code

  "mm, swap: misc cleanup and bugfix" (Kairui Song)
     Some cleanups and a minor fix in the swap discard code

  "mm/damon: misc documentation fixups" (SeongJae Park)

  "mm/damon: support pin-point targets removal" (SeongJae Park)
     Permit userspace to remove a specific monitoring target in the
     middle of the current targets list

  "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
     A couple of cleanups related to mm header file inclusion

  "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
     improve the selection of swap devices for NUMA machines

  "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
     Change the memory block labels from macros to enums so they will
     appear in kernel debug info

  "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
     Address an inefficiency when KSM unmerges an address range

  "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
     Fix leaks and unhandled malloc() failures in DAMON userspace unit
     tests

  "some cleanups for pageout()" (Baolin Wang)
     Clean up a couple of minor things in the page scanner's
     writeback-for-eviction code

  "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
     Move hugetlb's sysfs/sysctl handling code into a new file

  "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
     Make the VMA guard regions available in /proc/pid/smaps and
     improves the mergeability of guarded VMAs

  "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
     Reduce mmap lock contention for callers performing VMA guard region
     operations

  "vma_start_write_killable" (Matthew Wilcox)
     Start work on permitting applications to be killed when they are
     waiting on a read_lock on the VMA lock

  "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
     Add additional userspace testing of DAMON's "commit" feature

  "mm/damon: misc cleanups" (SeongJae Park)

  "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
     Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
     VMA is merged with another

  "mm: support device-private THP" (Balbir Singh)
     Introduce support for Transparent Huge Page (THP) migration in zone
     device-private memory

  "Optimize folio split in memory failure" (Zi Yan)

  "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
     Some more cleanups in the folio splitting code

  "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
     Clean up our handling of pagetable leaf entries by introducing the
     concept of 'software leaf entries', of type softleaf_t

  "reparent the THP split queue" (Muchun Song)
     Reparent the THP split queue to its parent memcg. This is in
     preparation for addressing the long-standing "dying memcg" problem,
     wherein dead memcg's linger for too long, consuming memory
     resources

  "unify PMD scan results and remove redundant cleanup" (Wei Yang)
     A little cleanup in the hugepage collapse code

  "zram: introduce writeback bio batching" (Sergey Senozhatsky)
     Improve zram writeback efficiency by introducing batched bio
     writeback support

  "memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
     Clean up our handling of the interrupt safety of some memcg stats

  "make vmalloc gfp flags usage more apparent" (Vishal Moola)
     Clean up vmalloc's handling of incoming GFP flags

  "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
     Teach soft dirty and userfaultfd write protect tracking to use
     RISC-V's Svrsw60t59b extension

  "mm: swap: small fixes and comment cleanups" (Youngjun Park)
     Fix a small bug and clean up some of the swap code

  "initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
     Start work on converting the vma struct's flags to a bitmap, so we
     stop running out of them, especially on 32-bit

  "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
     Address a possible bug in the swap discard code and clean things
     up a little

[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
  register device memory for poison handling") because it looks
  broken to me, I've asked for clarification   - Linus ]

* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm: fix vma_start_write_killable() signal handling
  mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
  mm/swapfile: fix list iteration when next node is removed during discard
  fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
  mm/kfence: add reboot notifier to disable KFENCE on shutdown
  memcg: remove inc/dec_lruvec_kmem_state helpers
  selftests/mm/uffd: initialize char variable to Null
  mm: fix DEBUG_RODATA_TEST indentation in Kconfig
  mm: introduce VMA flags bitmap type
  tools/testing/vma: eliminate dependency on vma->__vm_flags
  mm: simplify and rename mm flags function for clarity
  mm: declare VMA flags by bit
  zram: fix a spelling mistake
  mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
  mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
  pagemap: update BUDDY flag documentation
  mm: swap: remove scan_swap_map_slots() references from comments
  mm: swap: change swap_alloc_slow() to void
  mm, swap: remove redundant comment for read_swap_cache_async
  mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
  ...
2025-12-05 13:52:43 -08:00
Linus Torvalds
1dce50698a Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
    CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in
    generic code with [unsafe_]get_user(). All other architectures and ARM
    variants provide the relevant accessors already.
 
  - Ensure that ASM GOTO jump label usage in the user mode access helpers
    always goes through a local C scope label indirection inside the
    helpers. This is required because compilers are not supporting that a
    ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit
    the cleanup invocation and CLANG fails the build.
 
    This provides generic wrapper macros and the conversion of affected
    architecture code to use them.
 
  - Scoped user mode access with auto cleanup
 
    Access to user mode memory can be required in hot code paths, but if it
    has to be done with user controlled pointers, the access is shielded
    with a speculation barrier, so that the CPU cannot speculate around the
    address range check. Those speculation barriers impact performance quite
    significantly. This can be avoided by "masking" the provided pointer so
    it is guaranteed to be in the valid user memory access range and
    otherwise to point to a guaranteed unpopulated address space. This has
    to be done without branches so it creates an address dependency for the
    access, which the CPU cannot speculate ahead.
 
    This results in repeating and error prone programming patterns:
 
      	    if (can_do_masked_user_access())
                     from = masked_user_read_access_begin((from));
             else if (!user_read_access_begin(from, sizeof(*from)))
                     return -EFAULT;
             unsafe_get_user(val, from, Efault);
             user_read_access_end();
             return 0;
       Efault:
             user_read_access_end();
             return -EFAULT;
 
     which can be replaced with scopes and automatic cleanup:
 
             scoped_user_read_access(from, Efault)
                     unsafe_get_user(val, from, Efault);
             return 0;
        Efault:
             return -EFAULT;
 
  - Convert code which implements the above pattern over to
    scope_user.*.access(). This also corrects a couple of imbalanced
    masked_*_begin() instances which are harmless on most architectures, but
    prevent PowerPC from implementing the masking optimization.
 
  - Add a missing speculation barrier in copy_from_user_iter()
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksRfITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVhBEACEySjWcyCrD1e0ZFMFAOJZFI2BShav
 reotzCzmHYQdpVukDRxc64BgM2vN4yB04xnyMhi2o4hSTiIJhz1NzbKggsQJhVoA
 psYz+xEI161HuLZnUBUBuF9RRko/HVsbGqO2JFCuOKor4GCycvjVgupR3EIN9h5T
 HZEWGIgaTmN7MBj0QRrJgJkaaSTnPKOwWaNMV/F9pfk27zuB7vuV8WM9P3FaJYG+
 JGa9td7VGaBpWavxgMJqfdvXWBCVDDfZ1dunWx8tPTnLxKZZZD6HlfQXhZTr2n1e
 rtJpGgfVBx5Uqxn4RrhS0I7QeK1b9rrt3IU7EkFoaa3Z8LU5B7cHlm7KyicyoHhy
 SzFFUszssznT/0OhA5fmgPRlqI295HynW2p1L4Xy9hC0EZ2vXJPG5rO6X3x6QwSR
 asjRB7x/6JzWQUzE7/nhXd9KcB66wvQxhnjp7GqulF74aPBCtIdXXDD68YEDYkbi
 dPC3NRBr0ePbsGVGWbYvYIPWcvo1u814C2io1zKwmVbiN6lCYURgQK861vfAZUP8
 oP5D2a6ENgezDKoJo6eJ82inuDu64qZy7OOkU/aO3cbOuWGVyY9CjYD11x85Nr0k
 UNabSOfvcmhmobtYUiAgLLrjX1grQUG3F74ZQTw513mwgMObuDAAoS11GPjY6HL6
 b99WUJRv8jP66A==
 =6no0
 -----END PGP SIGNATURE-----

Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scoped user access updates from Thomas Gleixner:
 "Scoped user mode access and related changes:

   - Implement the missing u64 user access function on ARM when
     CONFIG_CPU_SPECTRE=n.

     This makes it possible to access a 64bit value in generic code with
     [unsafe_]get_user(). All other architectures and ARM variants
     provide the relevant accessors already.

   - Ensure that ASM GOTO jump label usage in the user mode access
     helpers always goes through a local C scope label indirection
     inside the helpers.

     This is required because compilers are not supporting that a ASM
     GOTO target leaves a auto cleanup scope. GCC silently fails to emit
     the cleanup invocation and CLANG fails the build.

     [ Editor's note: gcc-16 will have fixed the code generation issue
       in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
       goto jumps [PR122835]"). But we obviously have to deal with clang
       and older versions of gcc, so.. - Linus ]

     This provides generic wrapper macros and the conversion of affected
     architecture code to use them.

   - Scoped user mode access with auto cleanup

     Access to user mode memory can be required in hot code paths, but
     if it has to be done with user controlled pointers, the access is
     shielded with a speculation barrier, so that the CPU cannot
     speculate around the address range check. Those speculation
     barriers impact performance quite significantly.

     This cost can be avoided by "masking" the provided pointer so it is
     guaranteed to be in the valid user memory access range and
     otherwise to point to a guaranteed unpopulated address space. This
     has to be done without branches so it creates an address dependency
     for the access, which the CPU cannot speculate ahead.

     This results in repeating and error prone programming patterns:

       	    if (can_do_masked_user_access())
                      from = masked_user_read_access_begin((from));
              else if (!user_read_access_begin(from, sizeof(*from)))
                      return -EFAULT;
              unsafe_get_user(val, from, Efault);
              user_read_access_end();
              return 0;
        Efault:
              user_read_access_end();
              return -EFAULT;

      which can be replaced with scopes and automatic cleanup:

              scoped_user_read_access(from, Efault)
                      unsafe_get_user(val, from, Efault);
              return 0;
         Efault:
              return -EFAULT;

   - Convert code which implements the above pattern over to
     scope_user.*.access(). This also corrects a couple of imbalanced
     masked_*_begin() instances which are harmless on most
     architectures, but prevent PowerPC from implementing the masking
     optimization.

   - Add a missing speculation barrier in copy_from_user_iter()"

* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
  scm: Convert put_cmsg() to scoped user access
  iov_iter: Add missing speculation barrier to copy_from_user_iter()
  iov_iter: Convert copy_from_user_iter() to masked user access
  select: Convert to scoped user access
  x86/futex: Convert to scoped user access
  futex: Convert to get/put_user_inline()
  uaccess: Provide put/get_user_inline()
  uaccess: Provide scoped user access regions
  arm64: uaccess: Use unsafe wrappers for ASM GOTO
  s390/uaccess: Use unsafe wrappers for ASM GOTO
  riscv/uaccess: Use unsafe wrappers for ASM GOTO
  powerpc/uaccess: Use unsafe wrappers for ASM GOTO
  x86/uaccess: Use unsafe wrappers for ASM GOTO
  uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
  ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-02 08:01:39 -08:00
Linus Torvalds
4a26e7032d Core kernel bug handling infrastructure changes for v6.19:
- Improve WARN(), which has vararg printf like arguments,
     to work with the x86 #UD based WARN-optimizing infrastructure
     by hiding the format in the bug_table and replacing this
     first argument with the address of the bug-table entry,
     while making the actual function that's called a UD1 instruction.
     (Peter Zijlstra)
 
   - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch
     (Ingo Molnar, s390 support by Heiko Carstens)
 
 Fixes and cleanups:
 
   - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
 
   - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
     (Peter Zijlstra)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmktffcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gLCQ//ZMHTpv6UF+xY4Kq3rdVFJWRR7Prs2hDn
 4V/cFHu7Xb2lRwpJDBNWnbiEkLbGR5qlWD+0CpAMK4mGuL9VjnoeflEzXxOPP9W1
 7SWw1HDc6NkjHM3BNnvkPQbzWF4RL4ZGs4Lbb1nv7pjoTSdBMXNrD5RL+iNmfHHd
 QfOG9ZiRyD5A/b07bfyjIXNaeC/Hot+FeVXTMfD7/vCfc2ywhL2Sm5G/igY/shTY
 Una7q8sbFDD/bFFtWSR2eFQeHQQfT6c/Pu39ZcAIdTlLk1FtZ+A2wcRtwYv/FdPV
 6KDOAxZK7fLMHoCdNTswsg+LbazhABOb/V1J7TaHq2tF/PeyN+B+ucxVY2KFcxQJ
 V5eG5crMrUIL22QO/UaT3dPWxGbJYkNlAWl416tAKdgXA52W4OsPd+k4DGjeP569
 KogAy3SY0D/80v1QN2HcFEJMvr7W2SukxtErqdtA5OKt/ZevB49lGqZoBecqASDO
 QjI1K0yLKnb+erbMIpCpNj+o/Fr1JQgWbYVpwipL2GON4an6vrowimGTsUG7qqxN
 Hwb7IaTNnWQ/4iyCkVV44q6Ln1gyP2hz5Lnzo7QJINUdzp98UjJLAEK4NRG44T4L
 p0t5NMxnvREpAv35rwy03xhmITYeQMOqzN9JEBBvegQyld5ghbThxI7g9BDzcXyL
 Bd0mF7WOV9M=
 =bMUL
 -----END PGP SIGNATURE-----

Merge tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull bug handling infrastructure updates from Ingo Molnar:
 "Core updates:

   - Improve WARN(), which has vararg printf like arguments, to work
     with the x86 #UD based WARN-optimizing infrastructure by hiding the
     format in the bug_table and replacing this first argument with the
     address of the bug-table entry, while making the actual function
     that's called a UD1 instruction (Peter Zijlstra)

   - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
     Molnar, s390 support by Heiko Carstens)

  Fixes and cleanups:

   - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)

   - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
     Zijlstra)"

* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
  x86/bug: Fix BUG_FORMAT vs KASLR
  x86_64/bug: Inline the UD1
  x86/bug: Implement WARN_ONCE()
  x86_64/bug: Implement __WARN_printf()
  x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
  x86/bug: Add BUG_FORMAT basics
  bug: Allow architectures to provide __WARN_printf()
  bug: Implement WARN_ON() using __WARN_FLAGS()
  bug: Add report_bug_entry()
  bug: Add BUG_FORMAT_ARGS infrastructure
  bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
  bug: Add BUG_FORMAT infrastructure
  x86: Rework __bug_table helpers
  bugs/s390: Remove private WARN_ON() implementation
  bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
  bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
  ...
2025-12-01 21:33:01 -08:00
Chunyan Zhang
c64da3950c riscv: mm: add userfaultfd write-protect support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 60 for uffd-wp tracking

Additionally for tracking the uffd-wp state as a PTE swap bit, we borrow
bit 4 which is not involved into swap entry computation.

Link: https://lkml.kernel.org/r/20251113072806.795029-6-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
2a3ebad4db riscv: mm: add soft-dirty page tracking support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 59 for soft-dirty.

To add swap PTE soft-dirty tracking, we borrow bit 3 which is available
for swap PTEs on RISC-V systems.

Link: https://lkml.kernel.org/r/20251113072806.795029-5-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
59f6acb4be riscv: add RISC-V Svrsw60t59b extension support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software to use.

Link: https://lkml.kernel.org/r/20251113072806.795029-4-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Hui Min Mina Chou
3239c52fd2 RISC-V: KVM: Flush VS-stage TLB after VCPU migration for Andes cores
Most implementations cache the combined result of two-stage translation,
but some, like Andes cores, use split TLBs that store VS-stage and
G-stage entries separately.

On such systems, when a VCPU migrates to another CPU, an additional
HFENCE.VVMA is required to avoid using stale VS-stage entries, which
could otherwise cause guest faults.

Introduce a static key to identify CPUs with split two-stage TLBs.
When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to
prevent stale VS-stage mappings.

Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Dong Yang
df60cb2e67 KVM: riscv: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006b ("KVM: x86: enable dirty log gradually in
small chunks") and c862626 ("KVM: arm64: Support enabling dirty log
gradually in small chunks"). This adds support for riscv.

x86 and arm64 writes protect both huge pages and normal pages now, so
riscv protect also protects both huge pages and normal pages.

On a nested virtualization setup (RISC-V KVM running inside a QEMU VM
on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux
VM using different backing page sizes. The time taken for
memory_global_dirty_log_start in the L2 QEMU is listed below:

Page Size      Before    After Optimization
  4K            4490.23ms         31.94ms
  2M             48.97ms          45.46ms
  1G             28.40ms          30.93ms

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
7050f1d79f RISC-V: KVM: Add SBI MPXY extension support for Guest
The SBI MPXY extension is a platform-level functionality so KVM only
needs to forward SBI MPXY calls to KVM user-space.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-4-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
e2f3e2d37b RISC-V: KVM: Convert kvm_riscv_vcpu_sbi_forward() into extension handler
All uses of kvm_riscv_vcpu_sbi_forward() also updates retdata->uexit so
to further reduce code duplication move retdata->uexit assignment to
kvm_riscv_vcpu_sbi_forward() and convert it into SBI extension handler.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Peter Zijlstra
2ace527183 Merge branch 'objtool/core'
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-11-21 11:21:20 +01:00
Yao Zihong
e0a504984a riscv: hwprobe: Expose Zicbop extension and its block size
- Add `RISCV_HWPROBE_EXT_ZICBOP` to report the presence of the
  Zicbop extension.
- Add `RISCV_HWPROBE_KEY_ZICBOP_BLOCK_SIZE` to expose the block
  size (in bytes) when Zicbop is supported.
- Update hwprobe.rst to document the new extension bit and block
  size key, following the existing Zicbom/Zicboz style.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251118162436.15485-2-zihong.plct@isrc.iscas.ac.cn
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:29 -07:00
Xu Lu
ad1bb4b852 riscv: Introduce Zalasr instructions
Introduce l{b|h|w|d}.{aq|aqrl} and s{b|h|w|d}.{rl|aqrl} instruction
encodings.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://patch.msgid.link/20251020042056.30283-5-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Xu Lu
c9651fbc60 riscv: Add ISA extension parsing for Zalasr
Add parsing for Zalasr ISA extension.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://patch.msgid.link/20251020042056.30283-2-luxu.kernel@bytedance.com
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Yong-Xuan Wang
6efb1a9462 riscv: ptrace: Optimize the allocation of vector regset
The vector regset uses the maximum possible vlen value to estimate the
.n field. But not all the hardwares support the maximum vlen. Linux
might wastes time to prepare a large memory buffer(about 2^6 pages) for
the vector regset.

The regset can only copy vector registers when the process are using
vector. Add .active callback and determine the n field of vector regset
in riscv_v_setup_ctx_cache() doesn't affect the ptrace syscall and
coredump. It can avoid oversized allocations and better matches real
hardware limits.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251013091318.467864-2-yongxuan.wang@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
724c694479 riscv: cmpxchg: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZAWRS,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zawrs]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-5-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
6b85e9ac4a riscv: bitops: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-4-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
8261a9d167 riscv: hweight: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-3-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Vivian Wang
1c7d491d86 riscv: checksum: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zbb]", "nop", ...).

While we're at it, also remove bogus comment about Zbb being likely
available. We have to choose between "likely" and "unlikely" due to
limitations of the asm goto feature, but that does not mean we should
put a bad comment on why we pick "likely" over "unlikely".

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-2-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Vivian Wang
0a067ae21b riscv: pgtable: Use riscv_has_extension_unlikely
Use riscv_has_extension_unlikely() to check for RISCV_ISA_EXT_SVVPTC,
replacing the use of asm goto with ALTERNATIVE.

The "unlikely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("nop", "j %l[svvptc]", ...).

Note that this makes the check for RISCV_ISA_EXT_SVVPTC a runtime one if
RISCV_ALTERNATIVE=n, but it should still be worthwhile to do so given
that TLB flushes are relatively slow.

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-1-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Chao-ying Fu
91f815b707 riscv: Update MIPS vendor id to 0x127
[1] defines MIPS vendor id as 0x127. All previous MIPS RISC-V patches
were tested on QEMU, also modified to use 0x722 as MIPS_VENDOR_ID. This
new value should reflect real hardware.

[1] https://mips.com/wp-content/uploads/2025/06/P8700_Programmers_Reference_Manual_Rev1.84_5-31-2025.pdf

Fixes: a8fed1bc03 ("riscv: Add xmipsexectl as a vendor extension")
Signed-off-by: Chao-ying Fu <cfu@wavecomp.com>
Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com>
Link: https://patch.msgid.link/20251113-mips-vendorid-v2-1-3279489b7f84@htecgroup.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul WAlmsley <pjw@kernel.org>
2025-11-15 15:27:02 -07:00
Thomas Gleixner
0988ea18c6 riscv/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.

Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.419351819@linutronix.de
2025-11-03 15:26:10 +01:00
Ben Dooks
44aa25c000 riscv: asm: use .insn for making custom instructions
The assembler has .insn for building custom instructions
now, so change the .4byte to .insn. This ensures the output
is marked as an instruction and not as data which may
confuse both debuggers and anything else that relies on
this sort of marking.

Add an ASM_INSN_I() wrapper in asm.h to allow the selecting
of how this is output so older assemblers are still good.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20251024171640.65232-1-ben.dooks@codethink.co.uk
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-27 18:58:37 -06:00
Jingwei Wang
5d15d2ad36 riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot
The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF,
is determined by an asynchronous kthread. This can create a race
condition where the kthread finishes after the vDSO data has
already been populated, causing userspace to read stale values.

To fix this race, a new 'ready' flag is added to the vDSO data,
initialized to 'false' during arch_initcall_sync. This flag is
checked by both the vDSO's user-space code and the riscv_hwprobe
syscall. The syscall serves as a one-time gate, using a completion
to wait for any pending probes before populating the data and
setting the flag to 'true', thus ensuring userspace reads fresh
values on its first request.

Reported-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@irq.a4lg.com/
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Co-developed-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Jingwei Wang <wangjingwei@iscas.ac.cn>
Link: https://lore.kernel.org/r/20250811142035.105820-1-wangjingwei@iscas.ac.cn
[pjw@kernel.org: fix checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-17 22:23:11 -06:00
Paul Walmsley
492c513ec6 riscv: add a forward declaration for cpuinfo_op
Add a forward declaration for cpuinfo_op to resolve a sparse warning.

Link: https://lore.kernel.org/r/b831f349-5d0c-f7ac-8362-acb20bc6221a@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-17 22:10:01 -06:00
Samuel Holland
768e054de0 riscv: Remove the PER_CPU_OFFSET_SHIFT macro
__per_cpu_offset is an array of unsigned long, so we can reuse the
existing RISCV_LGPTR macro.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20251015225604.3860409-1-samuel.holland@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-17 22:00:29 -06:00
Samuel Holland
5898fc01ff riscv: mm: Define MAX_POSSIBLE_PHYSMEM_BITS for zsmalloc
This definition is used by zsmalloc to optimize memory allocation. On
riscv64, it is the same as MAX_PHYSMEM_BITS from asm/sparsemem.h, but
that definition depends on CONFIG_SPARSEMEM. The correct definition is
already provided for riscv32.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20251015233327.3885003-1-samuel.holland@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-17 22:00:29 -06:00
Anup Patel
ca525d53f9 RISC-V: Define pgprot_dmacoherent() for non-coherent devices
The pgprot_dmacoherent() is used when allocating memory for
non-coherent devices and by default pgprot_dmacoherent() is
same as pgprot_noncached() unless architecture overrides it.

Currently, there is no pgprot_dmacoherent() definition for
RISC-V hence non-coherent device memory is being mapped as
IO thereby making CPU access to such memory slow.

Define pgprot_dmacoherent() to be same as pgprot_writecombine()
for RISC-V so that CPU access non-coherent device memory as
NOCACHE which is better than accessing it as IO.

Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Tested-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Link: https://lore.kernel.org/r/20250820152316.1012757-1-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-17 21:30:05 -06:00
Miquel Sabaté Solà
781380d2cd riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES
The current value of BUFMAX is similar as in other architectures, but as
per documentation on KGDB (see
'Documentation/process/debugging/kgdb.rst'), BUFMAX has to be larger
than NUMREGBYTES.

Some NUMREGBYTES architectures (e.g. powerpc or hexagon) actually define
BUFMAX in relation to NUMREGBYTES, and thus this condition is always
guaranteed. Since 2048 is a value that is generally accepted on all
architectures, and that is larger than the current value of NUMREGBYTES,
we can keep this value in arch/riscv, but we can at least add an
'static_assert' as an extra measure just in case NUMREGBYTES changes in
the future for some unforseen reason.

Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Link: https://lore.kernel.org/r/20250915143252.154955-1-mikisabate@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-09 19:36:45 -06:00
Linus Torvalds
86bcf7be1e RISC-V updates for the v6.18 merge window (part two)
Second set of RISC-V updates for the v6.18 merge window, consisting
 of:
 
 - Support for the RISC-V-standardized RPMI interface.
 
   RPMI is a platform management communication mechanism between OSes
   running on application processors, and a remote platform management
   processor.  Similar to ARM SCMI, TI SCI, etc.  This includes irqchip,
   mailbox, and clk changes.
 
 - Support for the RISC-V-standardized MPXY SBI extension.
 
   MPXY is a RISC-V-specific standard implementing a shared memory
   mailbox between S-mode operating systems (e.g., Linux) and M-mode
   firmware (e.g., OpenSBI).  It is part of this PR since one of its
   use cases is to enable M-mode firmware to act as a single RPMI client
   for all RPMI activity on a core (including S-mode RPMI activity).
   Includes a mailbox driver.
 
 - Some ACPI-related updates to enable the use of RPMI and MPXY.
 
 - The addition of Linux-wide memcpy_{from,to}_le32() static inline
   functions, for RPMI use.
 
 - An ACPI Kconfig change to enable boot logos on any ACPI-using
   architecture (including RISC-V)
 
 - A RISC-V defconfig change to add GPIO keyboard and event device
   support, for front panel shutdown or reboot buttons
 
 This PR also includes a recent, one-line Kconfig patch from Geert to
 keep non-RISC-V users from being asked about building the RPMI virtual
 clock driver when !COMPILE_TEST.  THere's nothing preventing
 non-RISC-V SoCs from implementing RPMI, but until some users show up,
 let's not annoy others with it.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjfNccACgkQx4+xDQu9
 KkugVBAAsa9Oc6iEV1rwj1clWDrvz/bFJRxr/oH3PX0yJ4JJcZaBTydeW3fyJTJ8
 CzmRClGJ18YWjapJXYhMp1AS/VfsULu2AVfUzZUqHTjVZyxgt8xFgygpI+BHIyUN
 vD26iKJz/JvnytRmUi7mMtS0O48nTzdMiiOmu4Ved68YMJCRJKFZw8+rWVcAzrwb
 FHZlIE5Fcb1PRaUDg/45Baj0nEr+NRGKDLsR1rbocbmCmRMnz3ufPTcXk128+3gC
 VB1rQplcMBf2RpCl7p4LW2N746hcbg/RogfpjFy7KLlnEH+Xoh2nCxcWHaiEgR9q
 6JPsYBeekA54ZZsdoNBg1i5rGk3j/G1XGaV1bo7HDLTvShSByhaYrhAedQZEbw//
 xC3Eb7EQ6rNYUUjXiX0y5nhvl+nVlu/FmcsZmcP30ppOV4MQasTZ0zqfso23xhjL
 2e06PwTqsmXDeDNDQ4ruBKrpu8tkA7ZZvjCMq1rvSWjTPObzuGBe/ENrdBUOBb2E
 6UUeAGCZpQm1IxTcKHHxaIDT5ami745kqaBrXanIMKPX1JdCs7ahUqqWzC0LEgSy
 qB/T12bYg5O/yKXdXJuAuTHFb3TOPn6l8aNxRJve+uFwv4r1XXptdal9Yg2xoBWo
 EoGktm8KAp5Ndn5BntXI4xG4Ia3HOsj9YA7y4Iep4EO94JZk3Fk=
 =Ys1m
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Paul Walmsley:

 - Support for the RISC-V-standardized RPMI interface.

   RPMI is a platform management communication mechanism between OSes
   running on application processors, and a remote platform management
   processor. Similar to ARM SCMI, TI SCI, etc. This includes irqchip,
   mailbox, and clk changes.

 - Support for the RISC-V-standardized MPXY SBI extension.

   MPXY is a RISC-V-specific standard implementing a shared memory
   mailbox between S-mode operating systems (e.g., Linux) and M-mode
   firmware (e.g., OpenSBI). It is part of this PR since one of its use
   cases is to enable M-mode firmware to act as a single RPMI client for
   all RPMI activity on a core (including S-mode RPMI activity).
   Includes a mailbox driver.

 - Some ACPI-related updates to enable the use of RPMI and MPXY.

 - The addition of Linux-wide memcpy_{from,to}_le32() static inline
   functions, for RPMI use.

 - An ACPI Kconfig change to enable boot logos on any ACPI-using
   architecture (including RISC-V)

 - A RISC-V defconfig change to add GPIO keyboard and event device
   support, for front panel shutdown or reboot buttons

* tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (26 commits)
  clk: COMMON_CLK_RPMI should depend on RISCV
  ACPI: support BGRT table on RISC-V
  MAINTAINERS: Add entry for RISC-V RPMI and MPXY drivers
  RISC-V: Enable GPIO keyboard and event device in RV64 defconfig
  irqchip/riscv-rpmi-sysmsi: Add ACPI support
  mailbox/riscv-sbi-mpxy: Add ACPI support
  irqchip/irq-riscv-imsic-early: Export imsic_acpi_get_fwnode()
  ACPI: RISC-V: Add RPMI System MSI to GSI mapping
  ACPI: RISC-V: Add support to update gsi range
  ACPI: RISC-V: Create interrupt controller list in sorted order
  ACPI: scan: Update honor list for RPMI System MSI
  ACPI: Add support for nargs_prop in acpi_fwnode_get_reference_args()
  ACPI: property: Refactor acpi_fwnode_get_reference_args() to support nargs_prop
  irqchip: Add driver for the RPMI system MSI service group
  dt-bindings: Add RPMI system MSI interrupt controller bindings
  dt-bindings: Add RPMI system MSI message proxy bindings
  clk: Add clock driver for the RISC-V RPMI clock service group
  dt-bindings: clock: Add RPMI clock service controller bindings
  dt-bindings: clock: Add RPMI clock service message proxy bindings
  mailbox: Add RISC-V SBI message proxy (MPXY) based mailbox driver
  ...
2025-10-04 10:36:22 -07:00
Linus Torvalds
f3826aa996 guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
   types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
   precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
   no way to detect private vs. shared).
 
   This lays the groundwork for removal of guest memory from the kernel direct
   map, as well as for limited mmap() for guest_memfd-backed memory.
 
   For more information see:
   * a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
   * https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
     (guest_memfd in Firecracker)
   * https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
     (direct map removal)
   * https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
     (mmap support)
 
 ARM:
 
 * Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 * Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 * Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 * Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 * Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 * Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 * Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 * Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 * Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 * Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 
 LoongArch:
 
 * Detect page table walk feature on new hardware
 
 * Add sign extension with kernel MMIO/IOCSR emulation
 
 * Improve in-kernel IPI emulation
 
 * Improve in-kernel PCH-PIC emulation
 
 * Move kvm_iocsr tracepoint out of generic code
 
 RISC-V:
 
 * Added SBI FWFT extension for Guest/VM with misaligned delegation and
   pointer masking PMLEN features
 
 * Added ONE_REG interface for SBI FWFT extension
 
 * Added Zicbop and bfloat16 extensions for Guest/VM
 
 * Enabled more common KVM selftests for RISC-V
 
 * Added SBI v3.0 PMU enhancements in KVM and perf driver
 
 s390:
 
 * Improve interrupt cpu for wakeup, in particular the heuristic to decide
   which vCPU to deliver a floating interrupt to.
 
 * Clear the PTE when discarding a swapped page because of CMMA; this
   bug was introduced in 6.16 when refactoring gmap code.
 
 x86 selftests:
 
 * Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).
 
 * Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).
 
 * Minor cleanups and improvements
 
 x86 (guest side):
 
 * For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.
 
 * Make kvm_async_pf_task_wake() a local static helper and remove its
   export.
 
 * Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
 
 Generic:
 
 * Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
   now included in GFP_NOWAIT.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
 FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
 6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
 F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
 i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
 FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
 =fyov
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This excludes the bulk of the x86 changes, which I will send
  separately. They have two not complex but relatively unusual conflicts
  so I will wait for other dust to settle.

  guest_memfd:

   - Add support for host userspace mapping of guest_memfd-backed memory
     for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
     (which isn't precisely the same thing as CoCo VMs, since x86's
     SEV-MEM and SEV-ES have no way to detect private vs. shared).

     This lays the groundwork for removal of guest memory from the
     kernel direct map, as well as for limited mmap() for
     guest_memfd-backed memory.

     For more information see:
       - commit a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD")
       - guest_memfd in Firecracker:
           https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
       - direct map removal:
           https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
       - mmap support:
           https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

  ARM:

   - Add support for FF-A 1.2 as the secure memory conduit for pKVM,
     allowing more registers to be used as part of the message payload.

   - Change the way pKVM allocates its VM handles, making sure that the
     privileged hypervisor is never tricked into using uninitialised
     data.

   - Speed up MMIO range registration by avoiding unnecessary RCU
     synchronisation, which results in VMs starting much quicker.

   - Add the dump of the instruction stream when panic-ing in the EL2
     payload, just like the rest of the kernel has always done. This
     will hopefully help debugging non-VHE setups.

   - Add 52bit PA support to the stage-1 page-table walker, and make use
     of it to populate the fault level reported to the guest on failing
     to translate a stage-1 walk.

   - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
     feature parity for guests, irrespective of the host platform.

   - Fix some really ugly architecture problems when dealing with debug
     in a nested VM. This has some bad performance impacts, but is at
     least correct.

   - Add enough infrastructure to be able to disable EL2 features and
     give effective values to the EL2 control registers. This then
     allows a bunch of features to be turned off, which helps cross-host
     migration.

   - Large rework of the selftest infrastructure to allow most tests to
     transparently run at EL2. This is the first step towards enabling
     NV testing.

   - Various fixes and improvements all over the map, including one BE
     fix, just in time for the removal of the feature.

  LoongArch:

   - Detect page table walk feature on new hardware

   - Add sign extension with kernel MMIO/IOCSR emulation

   - Improve in-kernel IPI emulation

   - Improve in-kernel PCH-PIC emulation

   - Move kvm_iocsr tracepoint out of generic code

  RISC-V:

   - Added SBI FWFT extension for Guest/VM with misaligned delegation
     and pointer masking PMLEN features

   - Added ONE_REG interface for SBI FWFT extension

   - Added Zicbop and bfloat16 extensions for Guest/VM

   - Enabled more common KVM selftests for RISC-V

   - Added SBI v3.0 PMU enhancements in KVM and perf driver

  s390:

   - Improve interrupt cpu for wakeup, in particular the heuristic to
     decide which vCPU to deliver a floating interrupt to.

   - Clear the PTE when discarding a swapped page because of CMMA; this
     bug was introduced in 6.16 when refactoring gmap code.

  x86 selftests:

   - Add #DE coverage in the fastops test (the only exception that's
     guest- triggerable in fastop-emulated instructions).

   - Fix PMU selftests errors encountered on Granite Rapids (GNR),
     Sierra Forest (SRF) and Clearwater Forest (CWF).

   - Minor cleanups and improvements

  x86 (guest side):

   - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
     overriding guest MTRR for TDX/SNP to fix an issue where ACPI
     auto-mapping could map devices as WB and prevent the device drivers
     from mapping their devices with UC/UC-.

   - Make kvm_async_pf_task_wake() a local static helper and remove its
     export.

   - Use native qspinlocks when running in a VM with dedicated
     vCPU=>pCPU bindings even when PV_UNHALT is unsupported.

  Generic:

   - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
     __GFP_NOWARN is now included in GFP_NOWAIT.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
  KVM: s390: Fix to clear PTE when discarding a swapped page
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
  KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  selftests/kvm: remove stale TODO in xapic_state_test
  KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
  ...
2025-10-04 08:52:16 -07:00
Linus Torvalds
8804d970fa Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from
   Kairui Song improves performance and reduces the failure rate of swap
   cluster allocation.
 
 - The 4 patch series "support large align and nid in Rust allocators"
   from Vitaly Wool permits Rust allocators to set NUMA node and large
   alignment when perforning slub and vmalloc reallocs.
 
 - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from
   Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets
   for virtual address spaces for ops-level DAMOS filters.
 
 - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock"
   from Suren Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps.
 
 - The 2 patch series "mm/mincore: minor clean up for swap cache
   checking" from Kairui Song performs some cleanup in the swap code.
 
 - The 11 patch series "mm: vm_normal_page*() improvements" from David
   Hildenbrand provides code cleanup in the pagemap code.
 
 - The 5 patch series "add persistent huge zero folio support" from
   Pankaj Raghav provides a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero.
 
 - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a
   few touchups to the recently added Kexec Handover feature.
 
 - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all
   arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap.  To
   end the constant struggle with space shortage on 32-bit conflicting with
   64-bit's needs.
 
 - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li
   cleans up some swap code.
 
 - The 7 patch series "selftests/mm: Fix false positives and skip
   unsupported tests" from Donet Tom fixes a few things in our selftests
   code.
 
 - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide
   THPs when advised" from David Hildenbrand "allows individual processes
   to opt-out of THP=always into THP=madvise, without affecting other
   workloads on the system".
 
   It's a long story - the [1/N] changelog spells out the considerations.
 
 - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox
   gets us started on the memdesc project.  Please see
   https://kernelnewbies.org/MatthewWilcox/Memdescs and
   https://blogs.oracle.com/linux/post/introducing-memdesc.
 
 - The 3 patch series "Tiny optimization for large read operations" from
   Chi Zhiling improves the efficiency of the pagecache read path.
 
 - The 5 patch series "Better split_huge_page_test result check" from Zi
   Yan improves our folio splitting selftest code.
 
 - The 2 patch series "test that rmap behaves as expected" from Wei Yang
   adds some rmap selftests.
 
 - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig
   removes that function and converts its two remaining callers.
 
 - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain
   fixes some UFFD selftests issues.
 
 - The 3 patch series "introduce kernel file mapped folios" from Boris
   Burkov introduces the concept of "kernel file pages".  Using these
   permits btrfs to account its metadata pages to the root cgroup, rather
   than to the cgroups of random inappropriate tasks.
 
 - The 2 patch series "mm/pageblock: improve readability of some
   pageblock handling" from Wei Yang provides some readability improvements
   to the page allocator code.
 
 - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae
   Park teaches DAMON to understand arm32 highmem.
 
 - The 4 patch series "tools: testing: Use existing atomic.h for
   vma/maple tests" from Brendan Jackman performs some code cleanups and
   deduplication under tools/testing/.
 
 - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from
   Liam Howlett fixes a couple of 32-bit issues in
   tools/testing/radix-tree.c.
 
 - The 2 patch series "kasan: unify kasan_enabled() and remove
   arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN
   arch-specific initialization code into a common arch-neutral
   implementation.
 
 - The 3 patch series "mm: remove zpool" from Johannes Weiner removes
   zspool - an indirection layer which now only redirects to a single thing
   (zsmalloc).
 
 - The 2 patch series "mm: task_stack: Stack handling cleanups" from
   Pasha Tatashin makes a couple of cleanups in the fork code.
 
 - The 37 patch series "mm: remove nth_page()" from David Hildenbrand
   makes rather a lot of adjustments at various nth_page() callsites,
   eventually permitting the removal of that undesirable helper function.
 
 - The 2 patch series "introduce kasan.write_only option in hw-tags" from
   Yeoreum Yun creates a KASAN read-only mode for ARM, using that
   architecture's memory tagging feature.  It is felt that a read-only mode
   KASAN is suitable for use in production systems rather than debug-only.
 
 - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation"
   from Kefeng Wang does some tidying in the hugetlb folio allocation code.
 
 - The 12 patch series "mm: establish const-correctness for pointer
   parameters" from Max Kellermann makes quite a number of the MM API
   functions more accurate about the constness of their arguments.  This
   was getting in the way of subsystems (in this case CEPH) when they
   attempt to improving their own const/non-const accuracy.
 
 - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola
   fixes a number of code sites which were confused over when to use
   free_pages() vs __free_pages().
 
 - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice
   Ryhl makes the mapletree code accessible to Rust.  Required by nouveau
   and by its forthcoming successor: the new Rust Nova driver.
 
 - The 2 patch series "selftests/mm: split_huge_page_test:
   split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and
   some cleanups to the thp selftesting code.
 
 - The 14 patch series "mm, swap: introduce swap table as swap cache
   (phase I)" from Chris Li and Kairui Song is the first step along the
   path to implementing "swap tables" - a new approach to swap allocation
   and state tracking which is expected to yield speed and space
   improvements.  This patchset itself yields a 5-20% performance benefit
   in some situations.
 
 - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes
   the new memdesc layer to clean up the ptdesc code a little.
 
 - The 3 patch series "Fix va_high_addr_switch.sh test failure" from
   Chunyu Hu fixes some issues in our 5-level pagetable selftesting code.
 
 - The 2 patch series "Minor fixes for memory allocation profiling" from
   Suren Baghdasaryan addresses a couple of minor issues in relatively new
   memory allocation profiling feature.
 
 - The 3 patch series "Small cleanups" from Matthew Wilcox has a few
   cleanups in preparation for more memdesc work.
 
 - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and
   DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in
   furtherance of supporting arm highmem.
 
 - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix
   warnings" from Muhammad Anjum adds that compiler check to selftests code
   and fixes the fallout, by removing dead code.
 
 - The 10 patch series "Improvements to Victim Process Thawing and OOM
   Reaper Traversal Order" from zhongjinji makes a number of improvements
   in the OOM killer: mainly thawing a more appropriate group of victim
   threads so they can release resources.
 
 - The 5 patch series "mm/damon: misc fixups and improvements for 6.18"
   from SeongJae Park is a bunch of small and unrelated fixups for DAMON.
 
 - The 7 patch series "mm/damon: define and use DAMON initialization
   check function" from SeongJae Park implement reliability and
   maintainability improvements to a recently-added bug fix.
 
 - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and
   non-idle ages" from SeongJae Park provides additional transparency to
   userspace clients of the DAMON_STAT information.
 
 - The 2 patch series "Expand scope of khugepaged anonymous collapse"
   from Dev Jain removes some constraints on khubepaged's collapsing of
   anon VMAs.  It also increases the success rate of MADV_COLLAPSE against
   an anon vma.
 
 - The 2 patch series "mm: do not assume file == vma->vm_file in
   compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards
   removal of file_operations.mmap().  This patchset concentrates upon
   clearing up the treatment of stacked filesystems.
 
 - The 6 patch series "mm: Improve mlock tracking for large folios" from
   Kiryl Shutsemau provides some fixes and improvements to mlock's tracking
   of large folios.  /proc/meminfo's "Mlocked" field became more accurate.
 
 - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters
   during fork" from Donet Tom fixes several user-visible KSM stats
   inaccuracies across forks and adds selftest code to verify these
   counters.
 
 - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei
   Yang addresses some potential but presently benign issues in KSM's
   mm_slot handling.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA
 jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1
 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA=
 =4mhY
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Linus Torvalds
ae28ed4578 bpf-next-6.18
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v
 bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/
 UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX
 +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9
 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT
 tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG
 TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH
 Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB
 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy
 n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p
 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd
 c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg=
 =LeQi
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc
   (Amery Hung)

   Applied as a stable branch in bpf-next and net-next trees.

 - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)

   Also a stable branch in bpf-next and net-next trees.

 - Enforce expected_attach_type for tailcall compatibility (Daniel
   Borkmann)

 - Replace path-sensitive with path-insensitive live stack analysis in
   the verifier (Eduard Zingerman)

   This is a significant change in the verification logic. More details,
   motivation, long term plans are in the cover letter/merge commit.

 - Support signed BPF programs (KP Singh)

   This is another major feature that took years to materialize.

   Algorithm details are in the cover letter/marge commit

 - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)

 - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)

 - Fix USDT SIB argument handling in libbpf (Jiawei Zhao)

 - Allow uprobe-bpf program to change context registers (Jiri Olsa)

 - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and
   Puranjay Mohan)

 - Allow access to union arguments in tracing programs (Leon Hwang)

 - Optimize rcu_read_lock() + migrate_disable() combination where it's
   used in BPF subsystem (Menglong Dong)

 - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred
   execution of BPF callback in the context of a specific task using the
   kernel’s task_work infrastructure (Mykyta Yatsenko)

 - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya
   Dwivedi)

 - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)

 - Improve the precision of tnum multiplier verifier operation
   (Nandakumar Edamana)

 - Use tnums to improve is_branch_taken() logic (Paul Chaignon)

 - Add support for atomic operations in arena in riscv JIT (Pu Lehui)

 - Report arena faults to BPF error stream (Puranjay Mohan)

 - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin
   Monnet)

 - Add bpf_strcasecmp() kfunc (Rong Tao)

 - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao
   Chen)

* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits)
  libbpf: Replace AF_ALG with open coded SHA-256
  selftests/bpf: Add stress test for rqspinlock in NMI
  selftests/bpf: Add test case for different expected_attach_type
  bpf: Enforce expected_attach_type for tailcall compatibility
  bpftool: Remove duplicate string.h header
  bpf: Remove duplicate crypto/sha2.h header
  libbpf: Fix error when st-prefix_ops and ops from differ btf
  selftests/bpf: Test changing packet data from kfunc
  selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
  selftests/bpf: Refactor stacktrace_map case with skeleton
  bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
  selftests/bpf: Fix flaky bpf_cookie selftest
  selftests/bpf: Test changing packet data from global functions with a kfunc
  bpf: Emit struct bpf_xdp_sock type in vmlinux BTF
  selftests/bpf: Task_work selftest cleanup fixes
  MAINTAINERS: Delete inactive maintainers from AF_XDP
  bpf: Mark kfuncs as __noclone
  selftests/bpf: Add kprobe multi write ctx attach test
  selftests/bpf: Add kprobe write ctx attach test
  selftests/bpf: Add uprobe context ip register change test
  ...
2025-09-30 17:58:11 -07:00
Linus Torvalds
7601d18be0 A set of changes to consolidate the generic TIF bits accross architectures
All architectures define the same set of generic TIF bits. This makes it
 pointlessly hard to add a new generic TIF bit or to change an existing one.
 
 Provide a generic variant and convert the architectures which utilize the
 generic entry code over to use it. The TIF space is divided into 16 generic
 bits and 16 architecture specific bits, which turned out to provide enough
 space on both sides.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaP78THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTKyD/9NEg3DiN6aM959o1MhvhDk7jNdECXm
 QOvZ85wJYhCv3Pb7RSgMxJL/dR9aWV2b24TSzEeki0awbR4nPeUz82vExmgS7rWX
 9rQLPrOZsrYd76IcAVoV5Ua/g48c+9TM8kLzoZFa9JEYMPmyTEiA3gy4bgab/aov
 L2b903ZrCNkWRKM1Wz5V8xdyPEzhE+cLhbWoPbeCqfzxqbv4+WWKQlPmqamQw+yq
 /61Xhq0tmx3+4hn1IB/Rc8yMTbAK0EwN7SHM+l7yaJ3ijlkGSele4S3mKAHv2I3c
 vODIFwdQ8pRbC1C5eMBnUKRm7Cmf+8CB3m+OIA5ghj10TPFiTUzQ6iG6nUSVniJm
 QB21LHYSWroeQBRibnT5k7RiW5QjtTQmcsjvO7S2rZ/7CkMr7LMu6kN1P0TSiOvc
 SKxs3MO72KaRU/JrEjLqvT2tvdpg2hpffg69U0jA1xCeFULE1jrqo3GwL8dPDk7z
 zKbC73JNg4QJDdi+hIn5nl0fRGVszLzkDum5eyCpCLY/W7BSiQ7q/ayzt9upsSOm
 uc7sqeIgelQMRDMoMNcQUsMnApT0JHOS74WQ03SfahZESj8eFoOb8pr7vaqu4lfi
 6LV4fpwZPBTMDcQ36r2JLuUTqHNHNtWn4xXjQ72ngovIVUL9A2H0DqK+1JhLJ6yX
 tknZ9WVmWVf+JQ==
 =6L98
 -----END PGP SIGNATURE-----

Merge tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull TIF bit unification updates from Thomas Gleixner:
 "A set of changes to consolidate the generic TIF (thread info flag)
  bits accross architectures.

  All architectures define the same set of generic TIF bits. This makes
  it pointlessly hard to add a new generic TIF bit or to change an
  existing one.

  Provide a generic variant and convert the architectures which utilize
  the generic entry code over to use it. The TIF space is divided into
  16 generic bits and 16 architecture specific bits, which turned out to
  provide enough space on both sides"

* tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  LoongArch: Fix bitflag conflict for TIF_FIXADE
  riscv: Use generic TIF bits
  loongarch: Use generic TIF bits
  s390/entry: Remove unused TIF flags
  s390: Use generic TIF bits
  x86: Use generic TIF bits
  asm-generic: Provide generic TIF infrastructure
2025-09-30 14:36:20 -07:00
Linus Torvalds
cb7e3669c6 RISC-V updates for the v6.18 merge window (part one)
First set of RISC-V updates for the v6.18 merge window, including:
 
 - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other
   architectures have already merged this type of cleanup)
 
 - The introduction of ioremap_wc() for RISC-V
 
 - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than
   open code
 
 - A RISC-V kprobes unit test
 
 - An architecture-specific endianness swap macro set implementation,
   leveraging some dedicated RISC-V instructions for this purpose if they
   are available
 
 - The ability to identity and communicate to userspace the presence of a
   MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE
   implementation in cpu_relax()
 
 - Several other miscellaneous cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjaMVIACgkQx4+xDQu9
 Kkva4g/9GE4gzwM+KHlc4e1sfsaTo4oXe9eV+Hj3gUJdM+g8dCtNFchPRCKFjHYb
 X9lm2YVL9Q3cZ7yWWy/DJtZ66Yz9foQ4laX7uYbjsdbdjbsAXLXhjNp6uk4nBrqb
 uW7Uq+Qel8qq66J/B4Z/U0UzF3e5MaptQ6sLuNRONY9OJUxG76zTSJVKwiVNsGaX
 8W59b7ALFlCIwCVyXGm/KO1EELjl8FKDENWXFE1v6T8XvfXfYuhcvUMp84ebbzHV
 D4kQrO3nxKQVgKEdCtW8xxt0/aWkYQ8zbQg8bD0gDDzMYb5uiJDcMFaa8ZuEYiLg
 EcAJX8LmE5GGlTcJf8/jxvaA87hisNsGFvNPXX1OuI26w2TUNC2u80wE6Q6a3aNu
 74oUtZEaPhdFuB31A0rALC8Zb2zalnwwjAL7xRlyAozUuye8Ej7mE7w97WTjIfWz
 7ZL19/+C1uawljzYLn+FJBIcl1wTyvOgx6T4TJlmOsa4OFnCreFx+a0Az6+Rx1GC
 XGsRyDkGPSabIbNVGxUCyTr4w+VpW1WlDjrjwyLC0DQTFW8wyP44W9/K2scp+CqJ
 bSCcAz8QtGAeZ5UlSmXYTOV69xXqZPaom7fVk5RFHoy24en5DSo7kj1NJ3ChupRD
 8ACpALcIgw/VEo0Tyqqy3dyVhPDuYaZMY3WgGGj9Cz18U37e/Ho=
 =nK3D
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley

 - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other
   architectures have already merged this type of cleanup)

 - The introduction of ioremap_wc() for RISC-V

 - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather
   than open code

 - A RISC-V kprobes unit test

 - An architecture-specific endianness swap macro set implementation,
   leveraging some dedicated RISC-V instructions for this purpose if
   they are available

 - The ability to identity and communicate to userspace the presence
   of a MIPS P8700-specific ISA extension, and to leverage its
   MIPS-specific PAUSE implementation in cpu_relax()

 - Several other miscellaneous cleanups

* tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits)
  riscv: errata: Fix the PAUSE Opcode for MIPS P8700
  riscv: hwprobe: Document MIPS xmipsexectl vendor extension
  riscv: hwprobe: Add MIPS vendor extension probing
  riscv: Add xmipsexectl instructions
  riscv: Add xmipsexectl as a vendor extension
  dt-bindings: riscv: Add xmipsexectl ISA extension description
  riscv: cpufeature: add validation for zfa, zfh and zfhmin
  perf: riscv: skip empty batches in counter start
  selftests: riscv: Add README for RISC-V KSelfTest
  riscv: sbi: Switch to new sys-off handler API
  riscv: Move vendor errata definitions to new header
  RISC-V: ACPI: enable parsing the BGRT table
  riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG
  riscv: pi: use 'targets' instead of extra-y in Makefile
  riscv: introduce asm/swab.h
  riscv: mmap(): use unsigned offset type in riscv_sys_mmap
  drivers/perf: riscv: Remove redundant ternary operators
  riscv: mm: Use mmu-type from FDT to limit SATP mode
  riscv: mm: Return intended SATP mode for noXlvl options
  riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM
  ...
2025-09-29 19:01:08 -07:00
Linus Torvalds
a5ba183bde hardening updates for v6.18-rc1
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
 
 - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
   (Junjie Cao)
 
 - Add str_assert_deassert() helper (Lad Prabhakar)
 
 - gcc-plugins: Remove TODO_verify_il for GCC >= 16
 
 - kconfig: Fix BrokenPipeError warnings in selftests
 
 - kconfig: Add transitional symbol attribute for migration support
 
 - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNraNQAKCRA2KwveOeQk
 u/DkAPwKPP5BSmVR2wkdpQaXIr3PGA+cbBYp34DMJNujZ9piIwD/WZ+HfGTLoERy
 +2Q6HLj9hUdd+Rx3IZ8/w1QmnhUIUAU=
 =AwV9
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One notable addition is the creation of the 'transitional' keyword for
  kconfig so CONFIG renaming can go more smoothly.

  This has been a long-standing deficiency, and with the renaming of
  CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
  support), this came up again.

  The breadth of the diffstat is mainly this renaming.

   - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)

   - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
     (Junjie Cao)

   - Add str_assert_deassert() helper (Lad Prabhakar)

   - gcc-plugins: Remove TODO_verify_il for GCC >= 16

   - kconfig: Fix BrokenPipeError warnings in selftests

   - kconfig: Add transitional symbol attribute for migration support

   - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"

* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/string_choices: Add str_assert_deassert() helper
  kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
  kconfig: Add transitional symbol attribute for migration support
  kconfig: Fix BrokenPipeError warnings in selftests
  gcc-plugins: Remove TODO_verify_il for GCC >= 16
  stddef: Introduce __TRAILING_OVERLAP()
  stddef: Remove token-pasting in TRAILING_OVERLAP()
  lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29 17:48:27 -07:00
Linus Torvalds
8c1ed30218 ffs-const update for v6.18-rc1
- PCI: Fix theoretical underflow in use of ffs().
 
 - Universally apply __attribute_const__ to all architecture's ffs()-family
   of functions.
 
 - Add KUnit tests for ffs() behavior and const-ness.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNrWngAKCRA2KwveOeQk
 u3ZGAPwJTscARU4MspnqpbuAV601dG1TNoJG+8JYH84r+R2jjQEAlmBZB0jaHbC2
 qFWjHivD/0ofvihKfAPFgxlakyV1XAg=
 =diXF
 -----END PGP SIGNATURE-----

Merge tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull ffs const-attribute cleanups from Kees Cook:
 "While working on various hardening refactoring a while back we
  encountered inconsistencies in the application of __attribute_const__
  on the ffs() family of functions.

  This series fixes this across all archs and adds KUnit tests.

  Notably, this found a theoretical underflow in PCI (also fixed here)
  and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I
  kept the series separate from the general hardening PR since it is a
  stand-alone "topic".

   - PCI: Fix theoretical underflow in use of ffs().

   - Universally apply __attribute_const__ to all architecture's
     ffs()-family of functions.

   - Add KUnit tests for ffs() behavior and const-ness"

* tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  KUnit: ffs: Validate all the __attribute_const__ annotations
  sparc: Add __attribute_const__ to ffs()-family implementations
  xtensa: Add __attribute_const__ to ffs()-family implementations
  s390: Add __attribute_const__ to ffs()-family implementations
  parisc: Add __attribute_const__ to ffs()-family implementations
  mips: Add __attribute_const__ to ffs()-family implementations
  m68k: Add __attribute_const__ to ffs()-family implementations
  openrisc: Add __attribute_const__ to ffs()-family implementations
  riscv: Add __attribute_const__ to ffs()-family implementations
  hexagon: Add __attribute_const__ to ffs()-family implementations
  alpha: Add __attribute_const__ to ffs()-family implementations
  sh: Add __attribute_const__ to ffs()-family implementations
  powerpc: Add __attribute_const__ to ffs()-family implementations
  x86: Add __attribute_const__ to ffs()-family implementations
  csky: Add __attribute_const__ to ffs()-family implementations
  bitops: Add __attribute_const__ to generic ffs()-family implementations
  KUnit: Introduce ffs()-family tests
  PCI: Test for bit underflow in pcie_set_readrq()
2025-09-29 16:31:35 -07:00
Sunil V L
bb96fb5a79 ACPI: RISC-V: Add RPMI System MSI to GSI mapping
The RPMI System MSI device will provide GSIs to downstream devices
(such as GED) so add it to the RISC-V GSI to fwnode mapping.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Link: https://lore.kernel.org/r/20250818040920.272664-20-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-25 19:49:09 -06:00
Sunil V L
4d185fdeef ACPI: RISC-V: Add support to update gsi range
Some RISC-V interrupt controllers like RPMI based system MSI interrupt
controllers do not have MADT entry defined. These interrupt controllers
exist only in the namespace. ACPI spec defines _GSB method to get the
GSI base of the interrupt controller, However, there is no such standard
method to get the GSI range. To support such interrupt controllers, set
the GSI range of such interrupt controllers to non-overlapping range and
provide API for interrupt controller driver to update it with proper
value.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Link: https://lore.kernel.org/r/20250818040920.272664-19-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-25 19:48:54 -06:00
Anup Patel
508da38677 RISC-V: Add defines for the SBI message proxy extension
Add defines for the new SBI message proxy extension which is part
of the SBI v3.0 specification.

Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Co-developed-by: Rahul Pathak <rpathak@ventanamicro.com>
Signed-off-by: Rahul Pathak <rpathak@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-4-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-24 16:57:43 -06:00
Kees Cook
23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Alexandre Ghiti
546e42c8c6 riscv: Use an atomic xchg in pudp_huge_get_and_clear()
Make sure we return the right pud value and not a value that could
have been overwritten in between by a different core.

Fixes: c3cc2a4a3a ("riscv: Add support for PUD THP")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250814-dev-alex-thp_pud_xchg-v1-1-b4704dfae206@rivosinc.com
[pjw@kernel.org: use xchg rather than atomic_long_xchg; avoid atomic op for !CONFIG_SMP like x86]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-23 18:25:52 -06:00
Djordje Todorovic
0b0ca959d2 riscv: errata: Fix the PAUSE Opcode for MIPS P8700
Add ERRATA_MIPS and ERRATA_MIPS_P8700_PAUSE_OPCODE configs.
Handle errata for the MIPS PAUSE instruction.

Signed-off-by: Djordje Todorovic <djordje.todorovic@htecgroup.com>
Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
Signed-off-by: Raj Vishwanathan4 <rvishwanathan@mips.com>
Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-7-a6cbbe1c3412@htecgroup.com
[pjw@kernel.org: updated to apply and compile; fixed a checkpatch issue]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19 10:33:56 -06:00
Aleksa Paunovic
bb4b0f8a1b riscv: hwprobe: Add MIPS vendor extension probing
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_MIPS_0" which allows
userspace to probe for the new xmipsexectl vendor extension.

Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-4-a6cbbe1c3412@htecgroup.com
[pjw@kernel.org: fixed some checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19 10:33:56 -06:00
Aleksa Paunovic
1d4ce63e33 riscv: Add xmipsexectl instructions
Add xmipsexectl instruction opcodes. This includes the MIPS.PAUSE,
MIPS.EHB, and MIPS.IHB instructions.

Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-3-a6cbbe1c3412@htecgroup.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19 10:33:56 -06:00
Aleksa Paunovic
a8fed1bc03 riscv: Add xmipsexectl as a vendor extension
Add support for MIPS vendor extensions. Add support for the xmipsexectl
vendor extension.

Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-2-a6cbbe1c3412@htecgroup.com
[pjw@kernel.org: added the MIPS vendor ID from another patch to fix the build]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18 20:36:00 -06:00
Guo Ren (Alibaba DAMO Academy)
16d18e3eaf riscv: Move vendor errata definitions to new header
Move vendor errata definitions into errata_list_vendors.h.

Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Link: https://lore.kernel.org/r/20250713155321.2064856-2-guoren@kernel.org
[pjw@kernel.org: updated to apply and to make the whitespace consistent]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18 08:22:00 -06:00
Ignacio Encinas
cc2294d3f9 riscv: introduce asm/swab.h
Implement endianness swap macros for RISC-V.

Use the rev8 instruction when Zbb is available. Otherwise, rely on the
default mask-and-shift implementation.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
Link: https://lore.kernel.org/r/20250723-riscv-swab-v6-1-fc11e9a2efc9@iencinas.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18 08:20:25 -06:00
Thomas Gleixner
41e871f2b6 riscv: Use generic TIF bits
No point in defining generic items and the upcoming RSEQ optimizations are
only available with this _and_ the generic entry infrastructure, which is
already used by RISCV. So no further action required here.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-17 08:14:05 +02:00
Nam Cao
518c550eeb riscv: kprobes: Move branch_funct3 to insn.h
Similar to other instruction-processing macros/functions, branch_funct3
should be in insn.h.

Move it into insn.h as RV_EXTRACT_FUNCT3. This new name matches the style
in insn.h.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/linux-riscv/200c29a26338f19d09963fa02562787e8cfa06f2.1747215274.git.namcao@linutronix.de/
[pjw@kernel.org: updated to use RV_X_MASK and to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 18:46:43 -06:00
Nam Cao
5fe5914027 riscv: kprobes: Move branch_rs2_idx to insn.h
Similar to other instruction-processing macros/functions, branch_rs2_idx
should be in insn.h.

Move it into insn.h as RV_EXTRACT_RS2_REG. This new name matches the style
in insn.h.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/linux-riscv/107d4a6c1818bf169be2407b273a0483e6d55bbb.1747215274.git.namcao@linutronix.de/
[pjw@kernel.org: updated to use RV_X_MASK and to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 18:46:43 -06:00
Alexandre Ghiti
a601732236 riscv: Move all duplicate insn parsing macros into asm/insn.h
kernel/traps_misaligned.c and kvm/vcpu_insn.c define the same macros to
extract information from the instructions.

Let's move the definitions into asm/insn.h to avoid this duplication.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-3-d865dc9ad180@rivosinc.com
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:29:07 -06:00
Alexandre Ghiti
833bbb0d91 riscv: Strengthen duplicate and inconsistent definition of RV_X()
RV_X() macro is defined in two different ways which is error prone.

So harmonize its first definition and add another macro RV_X_MASK() for
the second one.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-2-d865dc9ad180@rivosinc.com
[pjw@kernel.org: upcase the macro name to conform with previous practice]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:28:30 -06:00
Alexandre Ghiti
932131fd3e riscv: Fix typo EXRACT -> EXTRACT
Simply fix a typo.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-1-d865dc9ad180@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:28:02 -06:00
Thomas Huth
f811f58597 riscv: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.

This originally was a completely mechanical patch (done with a
simple "sed -i" statement), with some manual fixups during
rebasing of the patch later.

Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20250606070952.498274-3-thuth@redhat.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:25:30 -06:00
Yunhui Cui
3a8ee3a9f4 riscv: introduce ioremap_wc()
Compared with IO attributes, NC attributes can improve performance,
specifically in these aspects: Relaxed Order, Gathering, Supports Read
Speculation, Supports Unaligned Access.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250722091504.45974-2-cuiyunhui@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:24:27 -06:00
Atish Patra
dbdadd943a RISC-V: KVM: Upgrade the supported SBI version to 3.0
Upgrade the SBI version to v3.0 so that corresponding features
can be enabled in the guest.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-8-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 11:49:31 +05:30
Atish Patra
e309fd113b RISC-V: KVM: Implement get event info function
The new get_event_info funciton allows the guest to query the presence
of multiple events with single SBI call. Currently, the perf driver
in linux guest invokes it for all the standard SBI PMU events. Support
the SBI function implementation in KVM as well.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-7-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 11:49:31 +05:30
Atish Patra
adffbd06d0 drivers/perf: riscv: Implement PMU event info function
With the new SBI PMU event info function, we can query the availability
of the all standard SBI PMU events at boot time with a single ecall.
This improves the bootime by avoiding making an SBI call for each
standard PMU event. Since this function is defined only in SBI v3.0,
invoke this only if the underlying SBI implementation is v3.0 or higher.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-4-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 11:49:31 +05:30
Atish Patra
656ef2ea30 drivers/perf: riscv: Add raw event v2 support
SBI v3.0 introduced a new raw event type that allows wider
mhpmeventX width to be programmed via CFG_MATCH.

Use the raw event v2 if SBI v3.0 is available.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-2-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 11:49:31 +05:30
Anup Patel
48d67106f4 RISC-V: KVM: Implement ONE_REG interface for SBI FWFT state
The KVM user-space needs a way to save/restore the state of
SBI FWFT features so implement SBI extension ONE_REG callbacks.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250823155947.1354229-6-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:54:21 +05:30
Anup Patel
85e7850e0d RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementation
The ONE_REG handling of SBI extension enable/disable registers and
SBI extension state registers is already under SBI implementation.
On similar lines, let's move copy_sbi_ext_reg_indices() under SBI
implementation.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250823155947.1354229-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:54:18 +05:30