mirror of
https://github.com/torvalds/linux.git
synced 2026-05-31 02:24:24 +02:00
master
1748 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
9d42fc28fc |
riscv/traps: Introduce software check exception and uprobe handling
The Zicfiss and Zicfilp extensions introduce a new exception, the 'software check exception', in the privileged ISA, with cause code = 18. This patch implements support for software check exceptions. Additionally, the patch implements a CFI violation handler which checks the code in the xtval register. If xtval=2, the software check exception happened because of an indirect branch that didn't land on a 4 byte aligned PC or on a 'lpad' instruction, or the label value embedded in 'lpad' didn't match the label value set in the x7 register. If xtval=3, the software check exception happened due to a mismatch between the link register (x1 or x5) and the top of shadow stack (on execution of `sspopchk`). In case of a CFI violation, SIGSEGV is raised with code=SEGV_CPERR. SEGV_CPERR was introduced by the x86 shadow stack patches. To keep uprobes working, handle the uprobe event first before reporting the CFI violation in the software check exception handler. This is because, when the landing pad is activated, if the uprobe point is set at the lpad instruction at the beginning of a function, the system triggers a software check exception instead of an ebreak exception due to the exception priority. This would prevent uprobe from working. Reviewed-by: Zong Li <zong.li@sifive.com> Co-developed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-15-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up the patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
8a9e22d2ca |
riscv: Implement indirect branch tracking prctls
This patch adds a RISC-V implementation of the following prctls: PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and PR_LOCK_INDIR_BR_LP_STATUS. Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-14-b55691eacf4f@rivosinc.com [pjw@kernel.org: clean up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
61a0200211 |
riscv: Implement arch-agnostic shadow stack prctls
Implement an architecture-agnostic prctl() interface for setting and getting shadow stack status. The prctls implemented are PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS. As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to write to their own shadow stack using 'sspush' or 'ssamoswap'. PR_LOCK_SHADOW_STACK_STATUS locks the current shadow stack enablement configuration. Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-12-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
fd44a4a855 |
riscv/shstk: If needed allocate a new shadow stack on clone
Userspace specifies CLONE_VM to share address space and spawn new thread. 'clone' allows userspace to specify a new stack for a new thread. However there is no way to specify a new shadow stack base address without changing the API. This patch allocates a new shadow stack whenever CLONE_VM is given. In case of CLONE_VFORK, the parent is suspended until the child finishes; thus the child can use the parent's shadow stack. In case of !CLONE_VM, COW kicks in because entire address space is copied from parent to child. 'clone3' is extensible and can provide mechanisms for specifying the shadow stack as an input parameter. This is not settled yet and is being extensively discussed on the mailing list. Once that's settled, this code should be adapted. Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-11-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
0ea05c4f75 |
riscv: compat: fix COMPAT_UTS_MACHINE definition
The COMPAT_UTS_MACHINE for riscv was incorrectly defined as "riscv".
Change it to "riscv32" to reflect the correct 32-bit compat name.
Fixes:
|
||
|
|
d79f9c9cf7 |
mm: provide address parameter to p{te,md,ud}_user_accessible_page()
On several powerpc platforms, a page table entry may not imply whether the relevant mapping is for userspace or kernelspace. Instead, such platforms infer this by the address which is being accessed. Add an additional address argument to each of these routines in order to provide support for page table check on powerpc. [ajd@linux.ibm.com: rebase on arm64 changes] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-9-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
d7b4b67eb6 |
mm/page_table_check: reinstate address parameter in [__]page_table_check_pte_clear()
This reverts commit
|
||
|
|
649ec9e3d0 |
mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd_clear()
This reverts commit
|
||
|
|
2e6ac078ce |
mm/page_table_check: reinstate address parameter in [__]page_table_check_pud_clear()
This reverts commit
|
||
|
|
0a5ae44831 |
mm/page_table_check: provide addr parameter to page_table_check_ptes_set()
To provide support for powerpc platforms, provide an addr parameter to the __page_table_check_ptes_set() and page_table_check_ptes_set() routines. This parameter is needed on some powerpc platforms which do not encode whether a mapping is for user or kernel in the pte. On such platforms, this can be inferred from the addr parameter. [ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-5-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
6e2d8f9fc4 |
mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd[s]_set()
This reverts commit
|
||
|
|
c4a0c5ff85 |
mm/page_table_check: reinstate address parameter in [__]page_table_check_pud[s]_set()
This reverts commit
|
||
|
|
540de7ade1 |
riscv/mm: update write protect to work on shadow stacks
'fork' implements copy-on-write (COW) by making pages readonly in both child and parent. ptep_set_wrprotect() and pte_wrprotect() clear _PAGE_WRITE in PTE. The assumption is that the page is readable and, on a fault, copy-on-write happens. To implement COW on shadow stack pages, clearing the W bit makes them XWR = 000. This will result in the wrong PTE setting, which allows no permissions, but with V=1 and the PFN field pointing to the final page. Instead, the desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory. On the other hand, this doesn't interfere with existing COW for read-write memory. The assumption is always that _PAGE_READ must have been set, and thus, setting _PAGE_READ is harmless. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-9-b55691eacf4f@rivosinc.com [pjw@kernel.org: clarify patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
c68c2ef9d6 |
riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs
pte_mkwrite() creates PTEs with WRITE encodings for the underlying architecture. The underlying architecture can have two types of writeable mappings: one that can be written using regular store instructions, and another one that can only be written using specialized store instructions (like shadow stack stores). pte_mkwrite can select write PTE encoding based on VMA range (i.e. VM_SHADOW_STACK) Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-8-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
f56ffb8ada |
riscv/mm: manufacture shadow stack ptes
This patch implements the creation of a shadow stack pte on riscv. Creating shadow stack PTE on riscv means clearing RWX and then setting W=1. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-7-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
6c7559f22b |
riscv/mm: ensure PROT_WRITE leads to VM_READ | VM_WRITE
'arch_calc_vm_prot_bits' is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly 'riscv_sys_mmap' is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work. Earlier 'protection_map[VM_WRITE]' used to pick read-write PTE encodings. Now 'protection_map[VM_WRITE]' will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. The above changes ensure that existing apps continue to work because underneath, the kernel will be picking 'protection_map[VM_WRITE|VM_READ]' PTE encodings. Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-6-b55691eacf4f@rivosinc.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
79dd4f2f40 |
riscv: Add usercfi state for task and save/restore of CSR_SSP on trap entry/exit
Carve out space in the RISC-V architecture-specific thread struct for cfi status and shadow stack in usermode. This patch: - defines a new structure cfi_status with status bit for cfi feature - defines shadow stack pointer, base and size in cfi_status structure - defines offsets to new member fields in thread in asm-offsets.c - saves and restores shadow stack pointer on trap entry (U --> S) and exit (S --> U) Shadow stack save/restore is gated on feature availability and is implemented using alternatives. CSR_SSP can be context-switched in 'switch_to' as well, but as soon as kernel shadow stack support gets rolled in, the shadow stack pointer will need to be switched at trap entry/exit point (much like 'sp'). It can be argued that a kernel using a shadow stack deployment scenario may not be as prevalent as user mode using this feature. But even if there is some minimal deployment of kernel shadow stack, that means that it needs to be supported. Thus save/restore of shadow stack pointer is implemented in entry.S instead of in 'switch_to.h'. Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-5-b55691eacf4f@rivosinc.com [pjw@kernel.org: cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
41a2452c99 |
riscv: add Zicfiss / Zicfilp extension CSR and bit definitions
The Zicfiss and Zicfilp extensions are enabled via b3 and b2 in *envcfg CSRs. menvcfg controls enabling for S/HS mode. henvcfg controls enabling for VS. senvcfg controls enabling for U/VU mode. The Zicfilp extension extends *status CSRs to hold an 'expected landing pad' bit. A trap or interrupt can occur between an indirect jmp/call and target instruction. The 'expected landing pad' bit from the CPU is recorded into the xstatus CSR so that when the supervisor performs xret, the 'expected landing pad' state of the CPU can be restored. Zicfiss adds one new CSR, CSR_SSP, which contains the current shadow stack pointer. Signed-off-by: Deepak Gupta <debug@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-4-b55691eacf4f@rivosinc.com [pjw@kernel.org: grouped CSR_SSP macro with the other CSR macros; clarified patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
df11708566 |
riscv: zicfiss / zicfilp enumeration
This patch adds support for detecting the RISC-V ISA extensions Zicfiss and Zicfilp. Zicfiss and Zicfilp stand for the unprivileged integer spec extensions for shadow stack and indirect branch tracking, respectively. This patch looks for Zicfiss and Zicfilp in the device tree and accordingly lights up the corresponding bits in the cpu feature bitmap. Furthermore this patch adds detection utility functions to return whether shadow stack or landing pads are supported by the cpu. Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6 Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-3-b55691eacf4f@rivosinc.com [pjw@kernel.org: updated to apply; cleaned up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
9f77b4c5c3 |
riscv: mm: define copy_user_page() as copy_page()
Currently, the implementation of copy_user_page() is identical to copy_page(). Align riscv with other architectures (alpha, arc, arm64, hexagon, longarch, m68k, openrisc, s390, um, xtensa) and map copy_user_page() to copy_page() given that their implementation is identical. In addition to following a common pattern, this centralizes the implementation. Any changes to the underlying page copy logic (e.g., for CHERI) will now automatically propagate to copy_user_page(). Signed-off-by: Florian Schmaus <florian.schmaus@codasip.com> Link: https://patch.msgid.link/20260113134025.905627-1-florian.schmaus@codasip.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
494d4a051c |
riscv: fix minor typo in syscall.h comment
Some developers may be confused because RISC-V does not have a register named r0. Also, orig_r0 is not available in pt_regs structure, which is specific to riscv. So we had better fix this minor typo. Signed-off-by: Austin Kim <austin.kim@lge.com> Link: https://patch.msgid.link/aW3Z4zTBvGJpk7a7@adminpc-PowerEdge-R7525 Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
841e47d56c |
riscv: Add intermediate cast to 'unsigned long' in __get_user_asm
After commit |
||
|
|
8e38607aa4 |
treewide: provide a generic clear_user_page() variant
Patch series "mm: folio_zero_user: clear page ranges", v11.
This series adds clearing of contiguous page ranges for hugepages.
The series improves on the current discontiguous clearing approach in two
ways:
- clear pages in a contiguous fashion.
- use batched clearing via clear_pages() wherever exposed.
The first is useful because it allows us to make much better use of
hardware prefetchers.
The second, enables advertising the real extent to the processor. Where
specific instructions support it (ex. string instructions on x86; "mops"
on arm64 etc), a processor can optimize based on this because, instead of
seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a
larger unit being operated on.
For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to
a mode where they start eliding cacheline allocation. This is helpful not
just because it results in higher bandwidth, but also because now the
cache is not evicting useful cachelines and replacing them with zeroes.
Demand faulting a 64GB region shows performance improvement:
$ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5
baseline +series
(GBps +- %stdev) (GBps +- %stdev)
pg-sz=2MB 11.76 +- 1.10% 25.34 +- 1.18% [*] +115.47% preempt=*
pg-sz=1GB 24.85 +- 2.41% 39.22 +- 2.32% + 57.82% preempt=none|voluntary
pg-sz=1GB (similar) 52.73 +- 0.20% [#] +112.19% preempt=full|lazy
[*] This improvement is because switching to sequential clearing
allows the hardware prefetchers to do a much better job.
[#] For pg-sz=1GB a large part of the improvement is because of the
cacheline elision mentioned above. preempt=full|lazy improves upon
that because, not needing explicit invocations of cond_resched() to
ensure reasonable preemption latency, it can clear the full extent
as a single unit. In comparison the maximum extent used for
preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB).
When provided the full extent the processor forgoes allocating
cachelines on this path almost entirely.
(The hope is that eventually, in the fullness of time, the lazy
preemption model will be able to do the same job that none or
voluntary models are used for, allowing us to do away with
cond_resched().)
Raghavendra also tested previous version of the series on AMD Genoa and
sees similar improvement [1] with preempt=lazy.
$ perf bench mem map -p $page-size -f populate -s 64GB -l 10
base patched change
pg-sz=2MB 12.731939 GB/sec 26.304263 GB/sec 106.6%
pg-sz=1GB 26.232423 GB/sec 61.174836 GB/sec 133.2%
This patch (of 8):
Let's drop all variants that effectively map to clear_page() and provide
it in a generic variant instead.
We'll use the macro clear_user_page to indicate whether an architecture
provides it's own variant.
Also, clear_user_page() is only called from the generic variant of
clear_user_highpage(), so define it only if the architecture does not
provide a clear_user_highpage(). And, for simplicity define it in
linux/highmem.h.
Note that for parisc, clear_page() and clear_user_page() map to
clear_page_asm(), so we can just get rid of the custom clear_user_page()
implementation. There is a clear_user_page_asm() function on parisc, that
seems to be unused. Not sure what's up with that.
Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com
Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
bdce162f2e |
riscv: Use 64-bit variable for output in __get_user_asm
After commit
|
||
|
|
ee9ffcf99f |
riscv/paravirt: Use common code for paravirt_steal_clock()
Remove the arch specific variant of paravirt_steal_clock() and use the common one instead. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-11-jgross@suse.com |
||
|
|
e6b2aa6d40 |
sched: Move clock related paravirt code to kernel/sched
Paravirt clock related functions are available in multiple archs. In order to share the common parts, move the common static keys to kernel/sched/ and remove them from the arch specific files. Make a common paravirt_steal_clock() implementation available in kernel/sched/cputime.c, guarding it with a new config option CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch in case it wants to use that common variant. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com |
||
|
|
68b10fd40d |
paravirt: Remove asm/paravirt_api_clock.h
All architectures supporting CONFIG_PARAVIRT share the same contents of asm/paravirt_api_clock.h: #include <asm/paravirt.h> So remove all incarnations of asm/paravirt_api_clock.h and remove the only place where it is included, as there asm/paravirt.h is included anyway. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc, scheduler bits Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-6-jgross@suse.com |
||
|
|
957afeb99b |
riscv: remove irqflags.h inclusion in asm/bitops.h
The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h
The compilation log is as follows:
CC kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/riscv/include/asm/processor.h:55,
from ./arch/riscv/include/asm/thread_info.h:42,
from ./include/linux/thread_info.h:60,
from ./include/asm-generic/preempt.h:5,
from ./arch/riscv/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:79,
from ./arch/riscv/include/asm/percpu.h:8,
from ./include/linux/irqflags.h:19,
from ./arch/riscv/include/asm/bitops.h:14,
from ./include/linux/bitops.h:68,
from ./include/linux/log2.h:12,
from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
66 | return val ? __ffs(val) : size;
| ^~~~~
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251216014721.42262-2-cuiyunhui@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
5e5be092ff |
riscv: pgtable: Cleanup useless VA_USER_XXX definitions
These marcos are not used after commit |
||
|
|
5efaf92da4 |
riscv: Add SBI debug trigger extension and function ids
Debug trigger extension is an SBI extension to support native debugging in S-mode and VS-mode. This patch adds the extension and the function IDs defined by the extension. Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com> Link: https://patch.msgid.link/20250710125231.653967-2-hchauhan@ventanamicro.com [pjw@kernel.org: updated to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
f02dd25472 |
riscv/atomic.h: use RISCV_FULL_BARRIER in _arch_atomic* function.
Replace the same code with the pre-defined macro RISCV_FULL_BARRIER to simplify the code. Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn> Link: https://patch.msgid.link/20251120095831.64211-1-min_halo@163.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
3f0cbfb8a1 |
riscv: add ISA extension parsing for Zilsd and Zclsd
Add parsing for Zilsd and Zclsd ISA extensions which were ratified in
commit f88abf1 ("Integrating load/store pair for RV32 with the
main manual") of the riscv-isa-manual.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://patch.msgid.link/20250826162939.1494021-3-pincheng.plct@isrc.iscas.ac.cn
[pjw@kernel.org: cleaned up checkpatch issues, whitespace; updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
e0e51a0de0 |
riscv: mm: use xchg() on non-atomic_long_t variables, not atomic_long_xchg()
Let's not call atomic_long_xchg() on something that's not an
atomic_long_t, and just use xchg() instead. Continues the cleanup
from commit
|
||
|
|
425cc087fb |
riscv: mm: ptep_get_and_clear(): avoid atomic ops when !CONFIG_SMP
When !CONFIG_SMP, there's no need for atomic operations in ptep_get_and_clear(), so, similar to x86, let's not use atomics in this case. Cc: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
1e6084d5c4 |
riscv: mm: pmdp_huge_get_and_clear(): avoid atomic ops when !CONFIG_SMP
When !CONFIG_SMP, there's no need for atomic operations in
pmdp_huge_get_and_clear(), so, similar to what x86 does, let's not use
atomics in this case. See also commit
|
||
|
|
818d78ba1b |
riscv: signal: abstract header saving for setup_sigcontext
The function save_v_state() served two purposes. First, it saved extension context into the signal stack. Then, it constructed the extension header if there was no fault. The second part is independent of the extension itself. As a result, we can pull that part out, so future extensions may reuse it. This patch adds arch_ext_list and makes setup_sigcontext() go through all possible extensions' save() callback. The callback returns a positive value indicating the size of the successfully saved extension. Then the kernel proceeds to construct the header for that extension. The kernel skips an extension if it does not exist, or if the saving fails for some reasons. The error code is propagated out on the later case. This patch does not introduce any functional changes. Signed-off-by: Andy Chiu <andybnac@gmail.com> Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-16-b55691eacf4f@rivosinc.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
51d90a15fe |
ARM:
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register.
- Add AVEC basic support.
- Use 64-bit register definition for EIOINTC.
- Add KVM timer test cases for tools/selftests.
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of
exits (and worse performance) on z10 and earlier processor;
ESCA was introduced by z114/z196 in 2010.
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
caching is disabled, as there can't be any relevant SPTEs to zap.
- Relocate a misplaced export.
- Fix an async #PF bug where KVM would clear the completion queue when the
guest transitioned in and out of paging mode, e.g. when handling an SMI and
then returning to paged mode via RSM.
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping
the notifier registered is ok; the kernel does not use the MSRs and the
callback will run cleanly and restore host MSRs if the CPU manages to
return to userspace before the system goes down.
- Use the checked version of {get,put}_user().
- Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
timers can result in a hard lockup in the host.
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections.
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
behind CONFIG_CPU_MITIGATIONS.
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path;
the only reason they were handled in the fast path was to paper of a bug
in the core #MC code, and that has long since been fixed.
- Add emulator support for AVX MOV instructions, to play nice with emulated
devices whose guest drivers like to access PCI BARs with large multi-byte
instructions.
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs.
- Fix the worst of KVM's lack of EFER.LMSLE emulation.
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
- Fix incorrect handling of selective CR0 writes when checking intercepts
during emulation of L2 instructions.
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
VMRUN and #VMEXIT.
- Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
interrupt if the guest patched the underlying code after the VM-Exit, e.g.
when Linux patches code with a temporary INT3.
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
userspace, and extend KVM "support" to all policy bits that don't require
any actual support from KVM.
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of the
current vCPU state, partly as worthwhile cleanup, but mostly to pave the
way for tracking per-root TLB flushes, and elide EPT flushes on pCPU
migration if the root is clean from a previous flush.
- Add a few missing nested consistency checks.
- Rip out support for doing "early" consistency checks via hardware as the
functionality hasn't been used in years and is no longer useful in general;
replace it with an off-by-default module param to WARN if hardware fails
a check that KVM does not perform.
- Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
on VM-Enter.
- Misc cleanups.
- Overhaul the TDX code to address systemic races where KVM (acting on behalf
of userspace) could inadvertantly trigger lock contention in the TDX-Module;
KVM was either working around these in weird, ugly ways, or was simply
oblivious to them (though even Yan's devilish selftests could only break
individual VMs, not the host kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU,
if creating said vCPU failed partway through.
- Fix a few sparse warnings (bad annotation, 0 != NULL).
- Use struct_size() to simplify copying TDX capabilities to userspace.
- Fix a bug where TDX would effectively corrupt user-return MSR values if the
TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
- Misc cleanups.
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
irqfd cleanup.
- Fix a goof in the dirty ring documentation.
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkvMa8UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMlFwf+Ow7zOYUuELSQ+Jn+hOYXiCNrdBDx
ZamvMU8kLPr7XX0Zog6HgcMm//qyA6k5nSfqCjfsQZrIhRA/gWJ61jz1OX/Jxq18
pJ9Vz6epnEPYiOtBwz+v8OS8MqDqVNzj2i6W1/cLPQE50c1Hhw64HWS5CSxDQiHW
A7PVfl5YU12lW1vG3uE0sNESDt4Eh/spNM17iddXdF4ZUOGublserjDGjbc17E7H
8BX3DkC2plqkJKwtjg0ae62hREkITZZc7RqsnftUkEhn0N0H9+rb6NKUyzIVh9NZ
bCtCjtrKN9zfZ0Mujnms3ugBOVqNIputu/DtPnnFKXtXWSrHrgGSNv5ewA==
=PEcw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- Support for userspace handling of synchronous external aborts
(SEAs), allowing the VMM to potentially handle the abort in a
non-fatal manner
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers
in hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU
- Allow page table destruction to reschedule, fixing long
need_resched latencies observed when destroying a large VM
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register
- Add AVEC basic support
- Use 64-bit register definition for EIOINTC
- Add KVM timer test cases for tools/selftests
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of exits
(and worse performance) on z10 and earlier processor; ESCA was
introduced by z114/z196 in 2010
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
SPTE caching is disabled, as there can't be any relevant SPTEs to
zap
- Relocate a misplaced export
- Fix an async #PF bug where KVM would clear the completion queue
when the guest transitioned in and out of paging mode, e.g. when
handling an SMI and then returning to paged mode via RSM
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
keeping the notifier registered is ok; the kernel does not use the
MSRs and the callback will run cleanly and restore host MSRs if the
CPU manages to return to userspace before the system goes down
- Use the checked version of {get,put}_user()
- Fix a long-lurking bug where KVM's lack of catch-up logic for
periodic APIC timers can result in a hard lockup in the host
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
latter behind CONFIG_CPU_MITIGATIONS
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
path; the only reason they were handled in the fast path was to
paper of a bug in the core #MC code, and that has long since been
fixed
- Add emulator support for AVX MOV instructions, to play nice with
emulated devices whose guest drivers like to access PCI BARs with
large multi-byte instructions
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs
- Fix the worst of KVM's lack of EFER.LMSLE emulation
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode
- Fix incorrect handling of selective CR0 writes when checking
intercepts during emulation of L2 instructions
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
on VMRUN and #VMEXIT
- Fix a bug where KVM corrupt the guest code stream when re-injecting
a soft interrupt if the guest patched the underlying code after the
VM-Exit, e.g. when Linux patches code with a temporary INT3
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
to userspace, and extend KVM "support" to all policy bits that
don't require any actual support from KVM
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of
the current vCPU state, partly as worthwhile cleanup, but mostly to
pave the way for tracking per-root TLB flushes, and elide EPT
flushes on pCPU migration if the root is clean from a previous
flush
- Add a few missing nested consistency checks
- Rip out support for doing "early" consistency checks via hardware
as the functionality hasn't been used in years and is no longer
useful in general; replace it with an off-by-default module param
to WARN if hardware fails a check that KVM does not perform
- Fix a currently-benign bug where KVM would drop the guest's
SPEC_CTRL[63:32] on VM-Enter
- Misc cleanups
- Overhaul the TDX code to address systemic races where KVM (acting
on behalf of userspace) could inadvertantly trigger lock contention
in the TDX-Module; KVM was either working around these in weird,
ugly ways, or was simply oblivious to them (though even Yan's
devilish selftests could only break individual VMs, not the host
kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
TDX vCPU, if creating said vCPU failed partway through
- Fix a few sparse warnings (bad annotation, 0 != NULL)
- Use struct_size() to simplify copying TDX capabilities to userspace
- Fix a bug where TDX would effectively corrupt user-return MSR
values if the TDX Module rejects VP.ENTER and thus doesn't clobber
host MSRs as expected
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU
system/VM
- Forcefully override ARCH from x86_64 to x86 to play nice with
specifying ARCH=x86_64 on the command line
- Extend a bunch of nested VMX to validate nested SVM as well
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test
to verify KVM can save/restore nested VMX state when L1 is using
5-level paging, but L2 is not
- Clean up the guest paging code in anticipation of sharing the core
logic for nested EPT and nested NPT
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety
of rough edges in guest_memfd along the way
- Define a CLASS to automatically handle get+put when grabbing a
guest_memfd from a memslot to make it harder to leak references
- Enhance KVM selftests to make it easer to develop and debug
selftests like those added for guest_memfd NUMA support, e.g. where
test and/or KVM bugs often result in hard-to-debug SIGBUS errors
- Misc cleanups
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU
workqueue for irqfd cleanup
- Fix a goof in the dirty ring documentation
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
...
|
||
|
|
07025b51c1 |
First set of RISC-V updates for v6.19-rc1
- Enable parallel hotplug for RISC-V
- Optimize vector regset allocation for ptrace()
- Add a kernel selftest for the vector ptrace interface
- Enable the userspace RAID6 test to build and run using RISC-V
vectors
- Add initial support for the Zalasr RISC-V ratified ISA extension
- For the Zicbop RISC-V ratified ISA extension to userspace, expose
hardware and kernel support to userspace and add a kselftest for
Zicbop
- Convert open-coded instances of 'asm goto's that are controlled by
runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
following arm64's alternative_has_cap_{un,}likely()
- Remove an unnecessary mask in the GFP flags used in some calls to
pagetable_alloc()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmkx2zkACgkQx4+xDQu9
KkvagQ/+Iv94J0S8VWS4vvRMtvPMkEHvPADCmEOrMrVsQ+hZdasKsSFBKY4DHZt6
baGKEQoyJ0NDrIt51uNWzmR503/PGPwVYSDfrgrpS87IkcO9OWe/HiMSEAu0aCyp
dhKGYnjWUtawWzFQg1GQ2n1vLOm5cQ2u1vTptISqU1yg9XlXUN0LNzIqNB8GpQv3
Hm7qqNFyxZOyAC9/ZFGbX2/0KpKQh1xkXSEONxzRGADJ/bqHKKz3hdaOj3aVcoMM
zObvHBm+I0U6AnGSgiZm71dvO0vlijg1RsuD/wd1DlcGO9QAuaPHX+RKqmJUJFe8
d0JjOon+d6n/pBKxoPnZyfB1IHxFNb3kX3LjmKdP6NYeOmdHZ7LI3dzB+LFyPXZe
mkC948+9GExEqmHQx5ZqyCWDwKtknIQPA45ZjBi8e5YOU8nJQiapShYWQu/6ybKV
GFHRqjDIGfhpjZGVdxnUX2Iok1XcwaGKrI6/6P6WlQ/zHKGurtEEtGiI7XHDYS8m
FSGxYay4GIWUsNbNSBWcZp5QaGH0jW7qiZle3DR+8gDLe1DJzbIo6pWMiArm5v7x
fUmroR4FcF9x2X7A01IB2tEUGf/0nfuHgVfMNPNoHBGsiRs9mXzLMngJjbFXOGyF
EP61/W+K+eaJ0jjkfmnscBQEy8URLSNoACnwQLT9SQcAIC4xWTs=
=J1nv
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Enable parallel hotplug for RISC-V
- Optimize vector regset allocation for ptrace()
- Add a kernel selftest for the vector ptrace interface
- Enable the userspace RAID6 test to build and run using RISC-V vectors
- Add initial support for the Zalasr RISC-V ratified ISA extension
- For the Zicbop RISC-V ratified ISA extension to userspace, expose
hardware and kernel support to userspace and add a kselftest for
Zicbop
- Convert open-coded instances of 'asm goto's that are controlled by
runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
following arm64's alternative_has_cap_{un,}likely()
- Remove an unnecessary mask in the GFP flags used in some calls to
pagetable_alloc()
* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
selftests/riscv: Add Zicbop prefetch test
riscv: hwprobe: Expose Zicbop extension and its block size
riscv: Introduce Zalasr instructions
riscv: hwprobe: Export Zalasr extension
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: Add ISA extension parsing for Zalasr
selftests: riscv: Add test for the Vector ptrace interface
riscv: ptrace: Optimize the allocation of vector regset
raid6: test: Add support for RISC-V
raid6: riscv: Allow code to be compiled in userspace
raid6: riscv: Prevent compiler from breaking inline vector assembly code
riscv: cmpxchg: Use riscv_has_extension_likely
riscv: bitops: Use riscv_has_extension_likely
riscv: hweight: Use riscv_has_extension_likely
riscv: checksum: Use riscv_has_extension_likely
riscv: pgtable: Use riscv_has_extension_unlikely
riscv: Remove __GFP_HIGHMEM masking
RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
|
||
|
|
7203ca412f |
Significant patch series in this merge are as follows:
- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
Uladzislau Rezki reworks the vmalloc() code to support non-blocking
allocations (GFP_ATOIC, GFP_NOWAIT).
- The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
across fork/exec.
- The 4 patch series "mm/zswap: misc cleanup of code and documentations"
from SeongJae Park does some light maintenance work on the zswap code.
- The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
/sys/kernel/debug/page_owner debug feature. It adds unique identifiers
to differentiate the various stack traces so that userspace monitoring
tools can better match stack traces over time.
- The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
Hahn makes some minor alterations to the page allocator's per-cpu-pages
feature.
- The 2 patch series "Improve UFFDIO_MOVE scalability by removing
anon_vma lock" from Lokesh Gidra addresses a scalability issue in
userfaultfd's UFFDIO_MOVE operation.
- The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
- The 2 patch series "drivers/base/node: fold node register and
unregister functions" from Donet Tom cleans up the NUMA node handling
code a little.
- The 4 patch series "mm: some optimizations for prot numa" from Kefeng
Wang provides some cleanups and small optimizations to the NUMA
allocation hinting code.
- The 5 patch series "mm/page_alloc: Batch callers of
free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
boot on large machines. These were causing (harmless) softlockup
warnings.
- The 2 patch series "optimize the logic for handling dirty file folios
during reclaim" from Baolin Wang removes some now-unnecessary work from
page reclaim.
- The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
feature.
- The 2 patch series "mm/damon: fixes for address alignment issues in
DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
and DAMON_RECLAIM with certain userspace configuration.
- The 15 patch series "expand mmap_prepare functionality, port more
users" from Lorenzo Stoakes enhances the new(ish)
file_operations.mmap_prepare() method and ports additional callsites
from the old ->mmap() over to ->mmap_prepare().
- The 8 patch series "Fix stale IOTLB entries for kernel address space"
from Lu Baolu fixes a bug (and possible security issue on non-x86) in
the IOMMU code. In some situations the IOMMU could be left hanging onto
a stale kernel pagetable entry.
- The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
from Wei Yang cleans up and optimizes the folio splitting code.
- The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
Song implements some cleanups and a minor fix in the swap discard code.
- The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
Park does as advertised.
- The 9 patch series "mm/damon: support pin-point targets removal" from
SeongJae Park permits userspace to remove a specific monitoring target
in the middle of the current targets list.
- The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
from Harry Yoo implements a couple of cleanups related to mm header file
inclusion.
- The 2 patch series "mm/swapfile.c: select swap devices of default
priority round robin" from Baoquan He improves the selection of swap
devices for NUMA machines.
- The 3 patch series "mm: Convert memory block states (MEM_*) macros to
enums" from Israel Batista changes the memory block labels from macros
to enums so they will appear in kernel debug info.
- The 3 patch series "ksm: perform a range-walk to jump over holes in
break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
unmerges an address range.
- The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
userspace unit tests.
- The 2 patch series "some cleanups for pageout()" from Baolin Wang
cleans up a couple of minor things in the page scanner's
writeback-for-eviction code.
- The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
- The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
and improves the mergeability of guarded VMAs.
- The 2 patch series "mm: perform guard region install/remove under VMA
lock" from Lorenzo Stoakes reduces mmap lock contention for callers
performing VMA guard region operations.
- The 2 patch series "vma_start_write_killable" from Matthew Wilcox
starts work in permitting applications to be killed when they are
waiting on a read_lock on the VMA lock.
- The 11 patch series "mm/damon/tests: add more tests for online
parameters commit" from SeongJae Park adds additional userspace testing
of DAMON's "commit" feature.
- The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
that.
- The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
that VMA is merged with another.
- The 16 patch series "mm: support device-private THP" from Balbir Singh
introduces support for Transparent Huge Page (THP) migration in zone
device-private memory.
- The 3 patch series "Optimize folio split in memory failure" from Zi
Yan optimizes folio split operations in the memory failure code.
- The 2 patch series "mm/huge_memory: Define split_type and consolidate
split support checks" from Wei Yang provides some more cleanups in the
folio splitting code.
- The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
handling of pagetable leaf entries by introducing the concept of
'software leaf entries', of type softleaf_t.
- The 4 patch series "reparent the THP split queue" from Muchun Song
reparents the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory resources.
- The 3 patch series "unify PMD scan results and remove redundant
cleanup" from Wei Yang does a little cleanup in the hugepage collapse
code.
- The 6 patch series "zram: introduce writeback bio batching" from
Sergey Senozhatsky improves zram writeback efficiency by introducing
batched bio writeback support.
- The 4 patch series "memcg: cleanup the memcg stats interfaces" from
Shakeel Butt cleans up our handling of the interrupt safety of some
memcg stats.
- The 4 patch series "make vmalloc gfp flags usage more apparent" from
Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
- The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
from Chunyan Zhang teches soft dirty and userfaultfd write protect
tracking to use RISC-V's Svrsw60t59b extension.
- The 5 patch series "mm: swap: small fixes and comment cleanups" from
Youngjun Park fixes a small bug and cleans up some of the swap code.
- The 4 patch series "initial work on making VMA flags a bitmap" from
Lorenzo Stoakes starts work on converting the vma struct's flags to a
bitmap, so we stop running out of them, especially on 32-bit.
- The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
from Youngjun Park addresses a possible bug in the swap discard code and
cleans things up a little.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
=IUzS
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit
|
||
|
|
1dce50698a |
Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in
generic code with [unsafe_]get_user(). All other architectures and ARM
variants provide the relevant accessors already.
- Ensure that ASM GOTO jump label usage in the user mode access helpers
always goes through a local C scope label indirection inside the
helpers. This is required because compilers are not supporting that a
ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit
the cleanup invocation and CLANG fails the build.
This provides generic wrapper macros and the conversion of affected
architecture code to use them.
- Scoped user mode access with auto cleanup
Access to user mode memory can be required in hot code paths, but if it
has to be done with user controlled pointers, the access is shielded
with a speculation barrier, so that the CPU cannot speculate around the
address range check. Those speculation barriers impact performance quite
significantly. This can be avoided by "masking" the provided pointer so
it is guaranteed to be in the valid user memory access range and
otherwise to point to a guaranteed unpopulated address space. This has
to be done without branches so it creates an address dependency for the
access, which the CPU cannot speculate ahead.
This results in repeating and error prone programming patterns:
if (can_do_masked_user_access())
from = masked_user_read_access_begin((from));
else if (!user_read_access_begin(from, sizeof(*from)))
return -EFAULT;
unsafe_get_user(val, from, Efault);
user_read_access_end();
return 0;
Efault:
user_read_access_end();
return -EFAULT;
which can be replaced with scopes and automatic cleanup:
scoped_user_read_access(from, Efault)
unsafe_get_user(val, from, Efault);
return 0;
Efault:
return -EFAULT;
- Convert code which implements the above pattern over to
scope_user.*.access(). This also corrects a couple of imbalanced
masked_*_begin() instances which are harmless on most architectures, but
prevent PowerPC from implementing the masking optimization.
- Add a missing speculation barrier in copy_from_user_iter()
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksRfITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVhBEACEySjWcyCrD1e0ZFMFAOJZFI2BShav
reotzCzmHYQdpVukDRxc64BgM2vN4yB04xnyMhi2o4hSTiIJhz1NzbKggsQJhVoA
psYz+xEI161HuLZnUBUBuF9RRko/HVsbGqO2JFCuOKor4GCycvjVgupR3EIN9h5T
HZEWGIgaTmN7MBj0QRrJgJkaaSTnPKOwWaNMV/F9pfk27zuB7vuV8WM9P3FaJYG+
JGa9td7VGaBpWavxgMJqfdvXWBCVDDfZ1dunWx8tPTnLxKZZZD6HlfQXhZTr2n1e
rtJpGgfVBx5Uqxn4RrhS0I7QeK1b9rrt3IU7EkFoaa3Z8LU5B7cHlm7KyicyoHhy
SzFFUszssznT/0OhA5fmgPRlqI295HynW2p1L4Xy9hC0EZ2vXJPG5rO6X3x6QwSR
asjRB7x/6JzWQUzE7/nhXd9KcB66wvQxhnjp7GqulF74aPBCtIdXXDD68YEDYkbi
dPC3NRBr0ePbsGVGWbYvYIPWcvo1u814C2io1zKwmVbiN6lCYURgQK861vfAZUP8
oP5D2a6ENgezDKoJo6eJ82inuDu64qZy7OOkU/aO3cbOuWGVyY9CjYD11x85Nr0k
UNabSOfvcmhmobtYUiAgLLrjX1grQUG3F74ZQTw513mwgMObuDAAoS11GPjY6HL6
b99WUJRv8jP66A==
=6no0
-----END PGP SIGNATURE-----
Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scoped user access updates from Thomas Gleixner:
"Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
CONFIG_CPU_SPECTRE=n.
This makes it possible to access a 64bit value in generic code with
[unsafe_]get_user(). All other architectures and ARM variants
provide the relevant accessors already.
- Ensure that ASM GOTO jump label usage in the user mode access
helpers always goes through a local C scope label indirection
inside the helpers.
This is required because compilers are not supporting that a ASM
GOTO target leaves a auto cleanup scope. GCC silently fails to emit
the cleanup invocation and CLANG fails the build.
[ Editor's note: gcc-16 will have fixed the code generation issue
in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
goto jumps [PR122835]"). But we obviously have to deal with clang
and older versions of gcc, so.. - Linus ]
This provides generic wrapper macros and the conversion of affected
architecture code to use them.
- Scoped user mode access with auto cleanup
Access to user mode memory can be required in hot code paths, but
if it has to be done with user controlled pointers, the access is
shielded with a speculation barrier, so that the CPU cannot
speculate around the address range check. Those speculation
barriers impact performance quite significantly.
This cost can be avoided by "masking" the provided pointer so it is
guaranteed to be in the valid user memory access range and
otherwise to point to a guaranteed unpopulated address space. This
has to be done without branches so it creates an address dependency
for the access, which the CPU cannot speculate ahead.
This results in repeating and error prone programming patterns:
if (can_do_masked_user_access())
from = masked_user_read_access_begin((from));
else if (!user_read_access_begin(from, sizeof(*from)))
return -EFAULT;
unsafe_get_user(val, from, Efault);
user_read_access_end();
return 0;
Efault:
user_read_access_end();
return -EFAULT;
which can be replaced with scopes and automatic cleanup:
scoped_user_read_access(from, Efault)
unsafe_get_user(val, from, Efault);
return 0;
Efault:
return -EFAULT;
- Convert code which implements the above pattern over to
scope_user.*.access(). This also corrects a couple of imbalanced
masked_*_begin() instances which are harmless on most
architectures, but prevent PowerPC from implementing the masking
optimization.
- Add a missing speculation barrier in copy_from_user_iter()"
* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
scm: Convert put_cmsg() to scoped user access
iov_iter: Add missing speculation barrier to copy_from_user_iter()
iov_iter: Convert copy_from_user_iter() to masked user access
select: Convert to scoped user access
x86/futex: Convert to scoped user access
futex: Convert to get/put_user_inline()
uaccess: Provide put/get_user_inline()
uaccess: Provide scoped user access regions
arm64: uaccess: Use unsafe wrappers for ASM GOTO
s390/uaccess: Use unsafe wrappers for ASM GOTO
riscv/uaccess: Use unsafe wrappers for ASM GOTO
powerpc/uaccess: Use unsafe wrappers for ASM GOTO
x86/uaccess: Use unsafe wrappers for ASM GOTO
uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
ARM: uaccess: Implement missing __get_user_asm_dword()
|
||
|
|
4a26e7032d |
Core kernel bug handling infrastructure changes for v6.19:
- Improve WARN(), which has vararg printf like arguments,
to work with the x86 #UD based WARN-optimizing infrastructure
by hiding the format in the bug_table and replacing this
first argument with the address of the bug-table entry,
while making the actual function that's called a UD1 instruction.
(Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch
(Ingo Molnar, s390 support by Heiko Carstens)
Fixes and cleanups:
- bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
(Peter Zijlstra)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmktffcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gLCQ//ZMHTpv6UF+xY4Kq3rdVFJWRR7Prs2hDn
4V/cFHu7Xb2lRwpJDBNWnbiEkLbGR5qlWD+0CpAMK4mGuL9VjnoeflEzXxOPP9W1
7SWw1HDc6NkjHM3BNnvkPQbzWF4RL4ZGs4Lbb1nv7pjoTSdBMXNrD5RL+iNmfHHd
QfOG9ZiRyD5A/b07bfyjIXNaeC/Hot+FeVXTMfD7/vCfc2ywhL2Sm5G/igY/shTY
Una7q8sbFDD/bFFtWSR2eFQeHQQfT6c/Pu39ZcAIdTlLk1FtZ+A2wcRtwYv/FdPV
6KDOAxZK7fLMHoCdNTswsg+LbazhABOb/V1J7TaHq2tF/PeyN+B+ucxVY2KFcxQJ
V5eG5crMrUIL22QO/UaT3dPWxGbJYkNlAWl416tAKdgXA52W4OsPd+k4DGjeP569
KogAy3SY0D/80v1QN2HcFEJMvr7W2SukxtErqdtA5OKt/ZevB49lGqZoBecqASDO
QjI1K0yLKnb+erbMIpCpNj+o/Fr1JQgWbYVpwipL2GON4an6vrowimGTsUG7qqxN
Hwb7IaTNnWQ/4iyCkVV44q6Ln1gyP2hz5Lnzo7QJINUdzp98UjJLAEK4NRG44T4L
p0t5NMxnvREpAv35rwy03xhmITYeQMOqzN9JEBBvegQyld5ghbThxI7g9BDzcXyL
Bd0mF7WOV9M=
=bMUL
-----END PGP SIGNATURE-----
Merge tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull bug handling infrastructure updates from Ingo Molnar:
"Core updates:
- Improve WARN(), which has vararg printf like arguments, to work
with the x86 #UD based WARN-optimizing infrastructure by hiding the
format in the bug_table and replacing this first argument with the
address of the bug-table entry, while making the actual function
that's called a UD1 instruction (Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
Molnar, s390 support by Heiko Carstens)
Fixes and cleanups:
- bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
Zijlstra)"
* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
x86/bug: Fix BUG_FORMAT vs KASLR
x86_64/bug: Inline the UD1
x86/bug: Implement WARN_ONCE()
x86_64/bug: Implement __WARN_printf()
x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
x86/bug: Add BUG_FORMAT basics
bug: Allow architectures to provide __WARN_printf()
bug: Implement WARN_ON() using __WARN_FLAGS()
bug: Add report_bug_entry()
bug: Add BUG_FORMAT_ARGS infrastructure
bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
bug: Add BUG_FORMAT infrastructure
x86: Rework __bug_table helpers
bugs/s390: Remove private WARN_ON() implementation
bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
...
|
||
|
|
c64da3950c |
riscv: mm: add userfaultfd write-protect support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59 for software, this patch uses bit 60 for uffd-wp tracking Additionally for tracking the uffd-wp state as a PTE swap bit, we borrow bit 4 which is not involved into swap entry computation. Link: https://lkml.kernel.org/r/20251113072806.795029-6-zhangchunyan@iscas.ac.cn Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Conor Dooley <conor@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
2a3ebad4db |
riscv: mm: add soft-dirty page tracking support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59 for software, this patch uses bit 59 for soft-dirty. To add swap PTE soft-dirty tracking, we borrow bit 3 which is available for swap PTEs on RISC-V systems. Link: https://lkml.kernel.org/r/20251113072806.795029-5-zhangchunyan@iscas.ac.cn Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Conor Dooley <conor@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
59f6acb4be |
riscv: add RISC-V Svrsw60t59b extension support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59 for software to use. Link: https://lkml.kernel.org/r/20251113072806.795029-4-zhangchunyan@iscas.ac.cn Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Conor Dooley <conor@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
3239c52fd2 |
RISC-V: KVM: Flush VS-stage TLB after VCPU migration for Andes cores
Most implementations cache the combined result of two-stage translation, but some, like Andes cores, use split TLBs that store VS-stage and G-stage entries separately. On such systems, when a VCPU migrates to another CPU, an additional HFENCE.VVMA is required to avoid using stale VS-stage entries, which could otherwise cause guest faults. Introduce a static key to identify CPUs with split two-stage TLBs. When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to prevent stale VS-stage mappings. Signed-off-by: Hui Min Mina Chou <minachou@andestech.com> Signed-off-by: Ben Zong-You Xie <ben717@andestech.com> Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
df60cb2e67 |
KVM: riscv: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks for x86 in commit |
||
|
|
7050f1d79f |
RISC-V: KVM: Add SBI MPXY extension support for Guest
The SBI MPXY extension is a platform-level functionality so KVM only needs to forward SBI MPXY calls to KVM user-space. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251017155925.361560-4-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
e2f3e2d37b |
RISC-V: KVM: Convert kvm_riscv_vcpu_sbi_forward() into extension handler
All uses of kvm_riscv_vcpu_sbi_forward() also updates retdata->uexit so to further reduce code duplication move retdata->uexit assignment to kvm_riscv_vcpu_sbi_forward() and convert it into SBI extension handler. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251017155925.361560-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
2ace527183 |
Merge branch 'objtool/core'
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions. Signed-off-by: Peter Zijlstra <peterz@infradead.org> |
||
|
|
e0a504984a |
riscv: hwprobe: Expose Zicbop extension and its block size
- Add `RISCV_HWPROBE_EXT_ZICBOP` to report the presence of the Zicbop extension. - Add `RISCV_HWPROBE_KEY_ZICBOP_BLOCK_SIZE` to expose the block size (in bytes) when Zicbop is supported. - Update hwprobe.rst to document the new extension bit and block size key, following the existing Zicbom/Zicboz style. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Link: https://patch.msgid.link/20251118162436.15485-2-zihong.plct@isrc.iscas.ac.cn [pjw@kernel.org: updated to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
ad1bb4b852 |
riscv: Introduce Zalasr instructions
Introduce l{b|h|w|d}.{aq|aqrl} and s{b|h|w|d}.{rl|aqrl} instruction
encodings.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://patch.msgid.link/20251020042056.30283-5-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
c9651fbc60 |
riscv: Add ISA extension parsing for Zalasr
Add parsing for Zalasr ISA extension. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://patch.msgid.link/20251020042056.30283-2-luxu.kernel@bytedance.com [pjw@kernel.org: updated to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
6efb1a9462 |
riscv: ptrace: Optimize the allocation of vector regset
The vector regset uses the maximum possible vlen value to estimate the .n field. But not all the hardwares support the maximum vlen. Linux might wastes time to prepare a large memory buffer(about 2^6 pages) for the vector regset. The regset can only copy vector registers when the process are using vector. Add .active callback and determine the n field of vector regset in riscv_v_setup_ctx_cache() doesn't affect the ptrace syscall and coredump. It can avoid oversized allocations and better matches real hardware limits. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Andy Chiu <andybnac@gmail.com> Tested-by: Andy Chiu <andybnac@gmail.com> Link: https://patch.msgid.link/20251013091318.467864-2-yongxuan.wang@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
724c694479 |
riscv: cmpxchg: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZAWRS,
replacing the use of asm goto with ALTERNATIVE.
The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zawrs]", "nop", ...).
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-5-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
6b85e9ac4a |
riscv: bitops: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.
The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-4-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
8261a9d167 |
riscv: hweight: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.
The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-3-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
1c7d491d86 |
riscv: checksum: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.
The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zbb]", "nop", ...).
While we're at it, also remove bogus comment about Zbb being likely
available. We have to choose between "likely" and "unlikely" due to
limitations of the asm goto feature, but that does not mean we should
put a bad comment on why we pick "likely" over "unlikely".
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-2-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
0a067ae21b |
riscv: pgtable: Use riscv_has_extension_unlikely
Use riscv_has_extension_unlikely() to check for RISCV_ISA_EXT_SVVPTC,
replacing the use of asm goto with ALTERNATIVE.
The "unlikely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("nop", "j %l[svvptc]", ...).
Note that this makes the check for RISCV_ISA_EXT_SVVPTC a runtime one if
RISCV_ALTERNATIVE=n, but it should still be worthwhile to do so given
that TLB flushes are relatively slow.
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-1-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
||
|
|
91f815b707 |
riscv: Update MIPS vendor id to 0x127
[1] defines MIPS vendor id as 0x127. All previous MIPS RISC-V patches
were tested on QEMU, also modified to use 0x722 as MIPS_VENDOR_ID. This
new value should reflect real hardware.
[1] https://mips.com/wp-content/uploads/2025/06/P8700_Programmers_Reference_Manual_Rev1.84_5-31-2025.pdf
Fixes:
|
||
|
|
0988ea18c6 |
riscv/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:
bool foo(u32 __user *p, u32 val)
{
scoped_guard(pagefault)
unsafe_put_user(val, p, efault);
return true;
efault:
return false;
}
It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.
Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.419351819@linutronix.de
|
||
|
|
44aa25c000 |
riscv: asm: use .insn for making custom instructions
The assembler has .insn for building custom instructions now, so change the .4byte to .insn. This ensures the output is marked as an instruction and not as data which may confuse both debuggers and anything else that relies on this sort of marking. Add an ASM_INSN_I() wrapper in asm.h to allow the selecting of how this is output so older assemblers are still good. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20251024171640.65232-1-ben.dooks@codethink.co.uk Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
5d15d2ad36 |
riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot
The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF,
is determined by an asynchronous kthread. This can create a race
condition where the kthread finishes after the vDSO data has
already been populated, causing userspace to read stale values.
To fix this race, a new 'ready' flag is added to the vDSO data,
initialized to 'false' during arch_initcall_sync. This flag is
checked by both the vDSO's user-space code and the riscv_hwprobe
syscall. The syscall serves as a one-time gate, using a completion
to wait for any pending probes before populating the data and
setting the flag to 'true', thus ensuring userspace reads fresh
values on its first request.
Reported-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@irq.a4lg.com/
Fixes:
|
||
|
|
492c513ec6 |
riscv: add a forward declaration for cpuinfo_op
Add a forward declaration for cpuinfo_op to resolve a sparse warning. Link: https://lore.kernel.org/r/b831f349-5d0c-f7ac-8362-acb20bc6221a@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
768e054de0 |
riscv: Remove the PER_CPU_OFFSET_SHIFT macro
__per_cpu_offset is an array of unsigned long, so we can reuse the existing RISCV_LGPTR macro. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20251015225604.3860409-1-samuel.holland@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
5898fc01ff |
riscv: mm: Define MAX_POSSIBLE_PHYSMEM_BITS for zsmalloc
This definition is used by zsmalloc to optimize memory allocation. On riscv64, it is the same as MAX_PHYSMEM_BITS from asm/sparsemem.h, but that definition depends on CONFIG_SPARSEMEM. The correct definition is already provided for riscv32. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20251015233327.3885003-1-samuel.holland@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
ca525d53f9 |
RISC-V: Define pgprot_dmacoherent() for non-coherent devices
The pgprot_dmacoherent() is used when allocating memory for
non-coherent devices and by default pgprot_dmacoherent() is
same as pgprot_noncached() unless architecture overrides it.
Currently, there is no pgprot_dmacoherent() definition for
RISC-V hence non-coherent device memory is being mapped as
IO thereby making CPU access to such memory slow.
Define pgprot_dmacoherent() to be same as pgprot_writecombine()
for RISC-V so that CPU access non-coherent device memory as
NOCACHE which is better than accessing it as IO.
Fixes:
|
||
|
|
781380d2cd |
riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES
The current value of BUFMAX is similar as in other architectures, but as per documentation on KGDB (see 'Documentation/process/debugging/kgdb.rst'), BUFMAX has to be larger than NUMREGBYTES. Some NUMREGBYTES architectures (e.g. powerpc or hexagon) actually define BUFMAX in relation to NUMREGBYTES, and thus this condition is always guaranteed. Since 2048 is a value that is generally accepted on all architectures, and that is larger than the current value of NUMREGBYTES, we can keep this value in arch/riscv, but we can at least add an 'static_assert' as an extra measure just in case NUMREGBYTES changes in the future for some unforseen reason. Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com> Link: https://lore.kernel.org/r/20250915143252.154955-1-mikisabate@gmail.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
86bcf7be1e |
RISC-V updates for the v6.18 merge window (part two)
Second set of RISC-V updates for the v6.18 merge window, consisting
of:
- Support for the RISC-V-standardized RPMI interface.
RPMI is a platform management communication mechanism between OSes
running on application processors, and a remote platform management
processor. Similar to ARM SCMI, TI SCI, etc. This includes irqchip,
mailbox, and clk changes.
- Support for the RISC-V-standardized MPXY SBI extension.
MPXY is a RISC-V-specific standard implementing a shared memory
mailbox between S-mode operating systems (e.g., Linux) and M-mode
firmware (e.g., OpenSBI). It is part of this PR since one of its
use cases is to enable M-mode firmware to act as a single RPMI client
for all RPMI activity on a core (including S-mode RPMI activity).
Includes a mailbox driver.
- Some ACPI-related updates to enable the use of RPMI and MPXY.
- The addition of Linux-wide memcpy_{from,to}_le32() static inline
functions, for RPMI use.
- An ACPI Kconfig change to enable boot logos on any ACPI-using
architecture (including RISC-V)
- A RISC-V defconfig change to add GPIO keyboard and event device
support, for front panel shutdown or reboot buttons
This PR also includes a recent, one-line Kconfig patch from Geert to
keep non-RISC-V users from being asked about building the RPMI virtual
clock driver when !COMPILE_TEST. THere's nothing preventing
non-RISC-V SoCs from implementing RPMI, but until some users show up,
let's not annoy others with it.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjfNccACgkQx4+xDQu9
KkugVBAAsa9Oc6iEV1rwj1clWDrvz/bFJRxr/oH3PX0yJ4JJcZaBTydeW3fyJTJ8
CzmRClGJ18YWjapJXYhMp1AS/VfsULu2AVfUzZUqHTjVZyxgt8xFgygpI+BHIyUN
vD26iKJz/JvnytRmUi7mMtS0O48nTzdMiiOmu4Ved68YMJCRJKFZw8+rWVcAzrwb
FHZlIE5Fcb1PRaUDg/45Baj0nEr+NRGKDLsR1rbocbmCmRMnz3ufPTcXk128+3gC
VB1rQplcMBf2RpCl7p4LW2N746hcbg/RogfpjFy7KLlnEH+Xoh2nCxcWHaiEgR9q
6JPsYBeekA54ZZsdoNBg1i5rGk3j/G1XGaV1bo7HDLTvShSByhaYrhAedQZEbw//
xC3Eb7EQ6rNYUUjXiX0y5nhvl+nVlu/FmcsZmcP30ppOV4MQasTZ0zqfso23xhjL
2e06PwTqsmXDeDNDQ4ruBKrpu8tkA7ZZvjCMq1rvSWjTPObzuGBe/ENrdBUOBb2E
6UUeAGCZpQm1IxTcKHHxaIDT5ami745kqaBrXanIMKPX1JdCs7ahUqqWzC0LEgSy
qB/T12bYg5O/yKXdXJuAuTHFb3TOPn6l8aNxRJve+uFwv4r1XXptdal9Yg2xoBWo
EoGktm8KAp5Ndn5BntXI4xG4Ia3HOsj9YA7y4Iep4EO94JZk3Fk=
=Ys1m
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Paul Walmsley:
- Support for the RISC-V-standardized RPMI interface.
RPMI is a platform management communication mechanism between OSes
running on application processors, and a remote platform management
processor. Similar to ARM SCMI, TI SCI, etc. This includes irqchip,
mailbox, and clk changes.
- Support for the RISC-V-standardized MPXY SBI extension.
MPXY is a RISC-V-specific standard implementing a shared memory
mailbox between S-mode operating systems (e.g., Linux) and M-mode
firmware (e.g., OpenSBI). It is part of this PR since one of its use
cases is to enable M-mode firmware to act as a single RPMI client for
all RPMI activity on a core (including S-mode RPMI activity).
Includes a mailbox driver.
- Some ACPI-related updates to enable the use of RPMI and MPXY.
- The addition of Linux-wide memcpy_{from,to}_le32() static inline
functions, for RPMI use.
- An ACPI Kconfig change to enable boot logos on any ACPI-using
architecture (including RISC-V)
- A RISC-V defconfig change to add GPIO keyboard and event device
support, for front panel shutdown or reboot buttons
* tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (26 commits)
clk: COMMON_CLK_RPMI should depend on RISCV
ACPI: support BGRT table on RISC-V
MAINTAINERS: Add entry for RISC-V RPMI and MPXY drivers
RISC-V: Enable GPIO keyboard and event device in RV64 defconfig
irqchip/riscv-rpmi-sysmsi: Add ACPI support
mailbox/riscv-sbi-mpxy: Add ACPI support
irqchip/irq-riscv-imsic-early: Export imsic_acpi_get_fwnode()
ACPI: RISC-V: Add RPMI System MSI to GSI mapping
ACPI: RISC-V: Add support to update gsi range
ACPI: RISC-V: Create interrupt controller list in sorted order
ACPI: scan: Update honor list for RPMI System MSI
ACPI: Add support for nargs_prop in acpi_fwnode_get_reference_args()
ACPI: property: Refactor acpi_fwnode_get_reference_args() to support nargs_prop
irqchip: Add driver for the RPMI system MSI service group
dt-bindings: Add RPMI system MSI interrupt controller bindings
dt-bindings: Add RPMI system MSI message proxy bindings
clk: Add clock driver for the RISC-V RPMI clock service group
dt-bindings: clock: Add RPMI clock service controller bindings
dt-bindings: clock: Add RPMI clock service message proxy bindings
mailbox: Add RISC-V SBI message proxy (MPXY) based mailbox driver
...
|
||
|
|
f3826aa996 |
guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: * |
||
|
|
8804d970fa |
Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation. - The 4 patch series "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs. - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters. - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps. - The 2 patch series "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code. - The 11 patch series "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code. - The 5 patch series "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero. - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature. - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs. - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code. - The 7 patch series "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code. - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations. - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc. - The 3 patch series "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path. - The 5 patch series "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code. - The 2 patch series "test that rmap behaves as expected" from Wei Yang adds some rmap selftests. - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers. - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues. - The 3 patch series "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks. - The 2 patch series "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code. - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem. - The 4 patch series "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/. - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c. - The 2 patch series "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation. - The 3 patch series "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc). - The 2 patch series "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code. - The 37 patch series "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function. - The 2 patch series "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only. - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code. - The 12 patch series "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy. - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages(). - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver. - The 2 patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code. - The 14 patch series "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations. - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little. - The 3 patch series "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code. - The 2 patch series "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature. - The 3 patch series "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work. - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem. - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code. - The 10 patch series "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources. - The 5 patch series "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON. - The 7 patch series "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix. - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information. - The 2 patch series "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma. - The 2 patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems. - The 6 patch series "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate. - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters. - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA= =4mhY -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ... |
||
|
|
ae28ed4578 |
bpf-next-6.18
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/ UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg= =LeQi -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) * tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ... |
||
|
|
7601d18be0 |
A set of changes to consolidate the generic TIF bits accross architectures
All architectures define the same set of generic TIF bits. This makes it pointlessly hard to add a new generic TIF bit or to change an existing one. Provide a generic variant and convert the architectures which utilize the generic entry code over to use it. The TIF space is divided into 16 generic bits and 16 architecture specific bits, which turned out to provide enough space on both sides. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaP78THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTKyD/9NEg3DiN6aM959o1MhvhDk7jNdECXm QOvZ85wJYhCv3Pb7RSgMxJL/dR9aWV2b24TSzEeki0awbR4nPeUz82vExmgS7rWX 9rQLPrOZsrYd76IcAVoV5Ua/g48c+9TM8kLzoZFa9JEYMPmyTEiA3gy4bgab/aov L2b903ZrCNkWRKM1Wz5V8xdyPEzhE+cLhbWoPbeCqfzxqbv4+WWKQlPmqamQw+yq /61Xhq0tmx3+4hn1IB/Rc8yMTbAK0EwN7SHM+l7yaJ3ijlkGSele4S3mKAHv2I3c vODIFwdQ8pRbC1C5eMBnUKRm7Cmf+8CB3m+OIA5ghj10TPFiTUzQ6iG6nUSVniJm QB21LHYSWroeQBRibnT5k7RiW5QjtTQmcsjvO7S2rZ/7CkMr7LMu6kN1P0TSiOvc SKxs3MO72KaRU/JrEjLqvT2tvdpg2hpffg69U0jA1xCeFULE1jrqo3GwL8dPDk7z zKbC73JNg4QJDdi+hIn5nl0fRGVszLzkDum5eyCpCLY/W7BSiQ7q/ayzt9upsSOm uc7sqeIgelQMRDMoMNcQUsMnApT0JHOS74WQ03SfahZESj8eFoOb8pr7vaqu4lfi 6LV4fpwZPBTMDcQ36r2JLuUTqHNHNtWn4xXjQ72ngovIVUL9A2H0DqK+1JhLJ6yX tknZ9WVmWVf+JQ== =6L98 -----END PGP SIGNATURE----- Merge tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull TIF bit unification updates from Thomas Gleixner: "A set of changes to consolidate the generic TIF (thread info flag) bits accross architectures. All architectures define the same set of generic TIF bits. This makes it pointlessly hard to add a new generic TIF bit or to change an existing one. Provide a generic variant and convert the architectures which utilize the generic entry code over to use it. The TIF space is divided into 16 generic bits and 16 architecture specific bits, which turned out to provide enough space on both sides" * tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: LoongArch: Fix bitflag conflict for TIF_FIXADE riscv: Use generic TIF bits loongarch: Use generic TIF bits s390/entry: Remove unused TIF flags s390: Use generic TIF bits x86: Use generic TIF bits asm-generic: Provide generic TIF infrastructure |
||
|
|
cb7e3669c6 |
RISC-V updates for the v6.18 merge window (part one)
First set of RISC-V updates for the v6.18 merge window, including: - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other architectures have already merged this type of cleanup) - The introduction of ioremap_wc() for RISC-V - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than open code - A RISC-V kprobes unit test - An architecture-specific endianness swap macro set implementation, leveraging some dedicated RISC-V instructions for this purpose if they are available - The ability to identity and communicate to userspace the presence of a MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE implementation in cpu_relax() - Several other miscellaneous cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjaMVIACgkQx4+xDQu9 Kkva4g/9GE4gzwM+KHlc4e1sfsaTo4oXe9eV+Hj3gUJdM+g8dCtNFchPRCKFjHYb X9lm2YVL9Q3cZ7yWWy/DJtZ66Yz9foQ4laX7uYbjsdbdjbsAXLXhjNp6uk4nBrqb uW7Uq+Qel8qq66J/B4Z/U0UzF3e5MaptQ6sLuNRONY9OJUxG76zTSJVKwiVNsGaX 8W59b7ALFlCIwCVyXGm/KO1EELjl8FKDENWXFE1v6T8XvfXfYuhcvUMp84ebbzHV D4kQrO3nxKQVgKEdCtW8xxt0/aWkYQ8zbQg8bD0gDDzMYb5uiJDcMFaa8ZuEYiLg EcAJX8LmE5GGlTcJf8/jxvaA87hisNsGFvNPXX1OuI26w2TUNC2u80wE6Q6a3aNu 74oUtZEaPhdFuB31A0rALC8Zb2zalnwwjAL7xRlyAozUuye8Ej7mE7w97WTjIfWz 7ZL19/+C1uawljzYLn+FJBIcl1wTyvOgx6T4TJlmOsa4OFnCreFx+a0Az6+Rx1GC XGsRyDkGPSabIbNVGxUCyTr4w+VpW1WlDjrjwyLC0DQTFW8wyP44W9/K2scp+CqJ bSCcAz8QtGAeZ5UlSmXYTOV69xXqZPaom7fVk5RFHoy24en5DSo7kj1NJ3ChupRD 8ACpALcIgw/VEo0Tyqqy3dyVhPDuYaZMY3WgGGj9Cz18U37e/Ho= =nK3D -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other architectures have already merged this type of cleanup) - The introduction of ioremap_wc() for RISC-V - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than open code - A RISC-V kprobes unit test - An architecture-specific endianness swap macro set implementation, leveraging some dedicated RISC-V instructions for this purpose if they are available - The ability to identity and communicate to userspace the presence of a MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE implementation in cpu_relax() - Several other miscellaneous cleanups * tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits) riscv: errata: Fix the PAUSE Opcode for MIPS P8700 riscv: hwprobe: Document MIPS xmipsexectl vendor extension riscv: hwprobe: Add MIPS vendor extension probing riscv: Add xmipsexectl instructions riscv: Add xmipsexectl as a vendor extension dt-bindings: riscv: Add xmipsexectl ISA extension description riscv: cpufeature: add validation for zfa, zfh and zfhmin perf: riscv: skip empty batches in counter start selftests: riscv: Add README for RISC-V KSelfTest riscv: sbi: Switch to new sys-off handler API riscv: Move vendor errata definitions to new header RISC-V: ACPI: enable parsing the BGRT table riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG riscv: pi: use 'targets' instead of extra-y in Makefile riscv: introduce asm/swab.h riscv: mmap(): use unsigned offset type in riscv_sys_mmap drivers/perf: riscv: Remove redundant ternary operators riscv: mm: Use mmu-type from FDT to limit SATP mode riscv: mm: Return intended SATP mode for noXlvl options riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM ... |
||
|
|
a5ba183bde |
hardening updates for v6.18-rc1
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNraNQAKCRA2KwveOeQk
u/DkAPwKPP5BSmVR2wkdpQaXIr3PGA+cbBYp34DMJNujZ9piIwD/WZ+HfGTLoERy
+2Q6HLj9hUdd+Rx3IZ8/w1QmnhUIUAU=
=AwV9
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
|
||
|
|
8c1ed30218 |
ffs-const update for v6.18-rc1
- PCI: Fix theoretical underflow in use of ffs().
- Universally apply __attribute_const__ to all architecture's ffs()-family
of functions.
- Add KUnit tests for ffs() behavior and const-ness.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNrWngAKCRA2KwveOeQk
u3ZGAPwJTscARU4MspnqpbuAV601dG1TNoJG+8JYH84r+R2jjQEAlmBZB0jaHbC2
qFWjHivD/0ofvihKfAPFgxlakyV1XAg=
=diXF
-----END PGP SIGNATURE-----
Merge tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull ffs const-attribute cleanups from Kees Cook:
"While working on various hardening refactoring a while back we
encountered inconsistencies in the application of __attribute_const__
on the ffs() family of functions.
This series fixes this across all archs and adds KUnit tests.
Notably, this found a theoretical underflow in PCI (also fixed here)
and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I
kept the series separate from the general hardening PR since it is a
stand-alone "topic".
- PCI: Fix theoretical underflow in use of ffs().
- Universally apply __attribute_const__ to all architecture's
ffs()-family of functions.
- Add KUnit tests for ffs() behavior and const-ness"
* tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
KUnit: ffs: Validate all the __attribute_const__ annotations
sparc: Add __attribute_const__ to ffs()-family implementations
xtensa: Add __attribute_const__ to ffs()-family implementations
s390: Add __attribute_const__ to ffs()-family implementations
parisc: Add __attribute_const__ to ffs()-family implementations
mips: Add __attribute_const__ to ffs()-family implementations
m68k: Add __attribute_const__ to ffs()-family implementations
openrisc: Add __attribute_const__ to ffs()-family implementations
riscv: Add __attribute_const__ to ffs()-family implementations
hexagon: Add __attribute_const__ to ffs()-family implementations
alpha: Add __attribute_const__ to ffs()-family implementations
sh: Add __attribute_const__ to ffs()-family implementations
powerpc: Add __attribute_const__ to ffs()-family implementations
x86: Add __attribute_const__ to ffs()-family implementations
csky: Add __attribute_const__ to ffs()-family implementations
bitops: Add __attribute_const__ to generic ffs()-family implementations
KUnit: Introduce ffs()-family tests
PCI: Test for bit underflow in pcie_set_readrq()
|
||
|
|
bb96fb5a79 |
ACPI: RISC-V: Add RPMI System MSI to GSI mapping
The RPMI System MSI device will provide GSIs to downstream devices (such as GED) so add it to the RISC-V GSI to fwnode mapping. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Link: https://lore.kernel.org/r/20250818040920.272664-20-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
4d185fdeef |
ACPI: RISC-V: Add support to update gsi range
Some RISC-V interrupt controllers like RPMI based system MSI interrupt controllers do not have MADT entry defined. These interrupt controllers exist only in the namespace. ACPI spec defines _GSB method to get the GSI base of the interrupt controller, However, there is no such standard method to get the GSI range. To support such interrupt controllers, set the GSI range of such interrupt controllers to non-overlapping range and provide API for interrupt controller driver to update it with proper value. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Link: https://lore.kernel.org/r/20250818040920.272664-19-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
508da38677 |
RISC-V: Add defines for the SBI message proxy extension
Add defines for the new SBI message proxy extension which is part of the SBI v3.0 specification. Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Co-developed-by: Rahul Pathak <rpathak@ventanamicro.com> Signed-off-by: Rahul Pathak <rpathak@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20250818040920.272664-4-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
23ef9d4397 |
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
||
|
|
546e42c8c6 |
riscv: Use an atomic xchg in pudp_huge_get_and_clear()
Make sure we return the right pud value and not a value that could
have been overwritten in between by a different core.
Fixes:
|
||
|
|
0b0ca959d2 |
riscv: errata: Fix the PAUSE Opcode for MIPS P8700
Add ERRATA_MIPS and ERRATA_MIPS_P8700_PAUSE_OPCODE configs. Handle errata for the MIPS PAUSE instruction. Signed-off-by: Djordje Todorovic <djordje.todorovic@htecgroup.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Raj Vishwanathan4 <rvishwanathan@mips.com> Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-7-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: updated to apply and compile; fixed a checkpatch issue] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
bb4b0f8a1b |
riscv: hwprobe: Add MIPS vendor extension probing
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_MIPS_0" which allows userspace to probe for the new xmipsexectl vendor extension. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-4-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: fixed some checkpatch issues] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
1d4ce63e33 |
riscv: Add xmipsexectl instructions
Add xmipsexectl instruction opcodes. This includes the MIPS.PAUSE, MIPS.EHB, and MIPS.IHB instructions. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-3-a6cbbe1c3412@htecgroup.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
a8fed1bc03 |
riscv: Add xmipsexectl as a vendor extension
Add support for MIPS vendor extensions. Add support for the xmipsexectl vendor extension. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-2-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: added the MIPS vendor ID from another patch to fix the build] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
16d18e3eaf |
riscv: Move vendor errata definitions to new header
Move vendor errata definitions into errata_list_vendors.h. Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Link: https://lore.kernel.org/r/20250713155321.2064856-2-guoren@kernel.org [pjw@kernel.org: updated to apply and to make the whitespace consistent] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
cc2294d3f9 |
riscv: introduce asm/swab.h
Implement endianness swap macros for RISC-V. Use the rev8 instruction when Zbb is available. Otherwise, rely on the default mask-and-shift implementation. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> Link: https://lore.kernel.org/r/20250723-riscv-swab-v6-1-fc11e9a2efc9@iencinas.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
41e871f2b6 |
riscv: Use generic TIF bits
No point in defining generic items and the upcoming RSEQ optimizations are only available with this _and_ the generic entry infrastructure, which is already used by RISCV. So no further action required here. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
||
|
|
518c550eeb |
riscv: kprobes: Move branch_funct3 to insn.h
Similar to other instruction-processing macros/functions, branch_funct3 should be in insn.h. Move it into insn.h as RV_EXTRACT_FUNCT3. This new name matches the style in insn.h. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/linux-riscv/200c29a26338f19d09963fa02562787e8cfa06f2.1747215274.git.namcao@linutronix.de/ [pjw@kernel.org: updated to use RV_X_MASK and to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
5fe5914027 |
riscv: kprobes: Move branch_rs2_idx to insn.h
Similar to other instruction-processing macros/functions, branch_rs2_idx should be in insn.h. Move it into insn.h as RV_EXTRACT_RS2_REG. This new name matches the style in insn.h. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/linux-riscv/107d4a6c1818bf169be2407b273a0483e6d55bbb.1747215274.git.namcao@linutronix.de/ [pjw@kernel.org: updated to use RV_X_MASK and to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
a601732236 |
riscv: Move all duplicate insn parsing macros into asm/insn.h
kernel/traps_misaligned.c and kvm/vcpu_insn.c define the same macros to extract information from the instructions. Let's move the definitions into asm/insn.h to avoid this duplication. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-3-d865dc9ad180@rivosinc.com [pjw@kernel.org: updated to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
833bbb0d91 |
riscv: Strengthen duplicate and inconsistent definition of RV_X()
RV_X() macro is defined in two different ways which is error prone. So harmonize its first definition and add another macro RV_X_MASK() for the second one. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-2-d865dc9ad180@rivosinc.com [pjw@kernel.org: upcase the macro name to conform with previous practice] Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
932131fd3e |
riscv: Fix typo EXRACT -> EXTRACT
Simply fix a typo. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-1-d865dc9ad180@rivosinc.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
f811f58597 |
riscv: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize on the __ASSEMBLER__ macro that is provided by the compilers now. This originally was a completely mechanical patch (done with a simple "sed -i" statement), with some manual fixups during rebasing of the patch later. Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: linux-riscv@lists.infradead.org Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20250606070952.498274-3-thuth@redhat.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
3a8ee3a9f4 |
riscv: introduce ioremap_wc()
Compared with IO attributes, NC attributes can improve performance, specifically in these aspects: Relaxed Order, Gathering, Supports Read Speculation, Supports Unaligned Access. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250722091504.45974-2-cuiyunhui@bytedance.com Signed-off-by: Paul Walmsley <pjw@kernel.org> |
||
|
|
dbdadd943a |
RISC-V: KVM: Upgrade the supported SBI version to 3.0
Upgrade the SBI version to v3.0 so that corresponding features can be enabled in the guest. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-8-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
e309fd113b |
RISC-V: KVM: Implement get event info function
The new get_event_info funciton allows the guest to query the presence of multiple events with single SBI call. Currently, the perf driver in linux guest invokes it for all the standard SBI PMU events. Support the SBI function implementation in KVM as well. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-7-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
adffbd06d0 |
drivers/perf: riscv: Implement PMU event info function
With the new SBI PMU event info function, we can query the availability of the all standard SBI PMU events at boot time with a single ecall. This improves the bootime by avoiding making an SBI call for each standard PMU event. Since this function is defined only in SBI v3.0, invoke this only if the underlying SBI implementation is v3.0 or higher. Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-4-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
656ef2ea30 |
drivers/perf: riscv: Add raw event v2 support
SBI v3.0 introduced a new raw event type that allows wider mhpmeventX width to be programmed via CFG_MATCH. Use the raw event v2 if SBI v3.0 is available. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-2-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
48d67106f4 |
RISC-V: KVM: Implement ONE_REG interface for SBI FWFT state
The KVM user-space needs a way to save/restore the state of SBI FWFT features so implement SBI extension ONE_REG callbacks. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250823155947.1354229-6-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
85e7850e0d |
RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementation
The ONE_REG handling of SBI extension enable/disable registers and SBI extension state registers is already under SBI implementation. On similar lines, let's move copy_sbi_ext_reg_indices() under SBI implementation. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250823155947.1354229-5-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> |