linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-31 10:33:41 +02:00

Author	SHA1	Message	Date
Tony Luck	0c00cfbcfc	ACPI: APEI: EINJ: Fix EINJV2 memory error injection Error types in EINJV2 use different bit positions for each flavor of injection from legacy EINJ. Two issues: 1) The address sanity checks in einj_error_inject() were skipped for EINJV2 injections. Noted by sashiko[1] 2) __einj_error_trigger() failed to drop the entry of the target physical address from the list of resources that need to be requested. Add a helper function that checks if an injection is to memory and use it to solve each of these issues. Note that the old test in __einj_error_trigger() checked that param2 was not zero. This isn't needed because the sanity checks in einj_error_inject() reject memory injections with param2 == 0. Fixes: `b47610296d` ("ACPI: APEI: EINJ: Enable EINJv2 error injections") Reported-by: sashiko <sashiko@sashiko.dev> Reported-by: Herman Li <herman.li@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: "Lai, Yi1" <yi1.lai@intel.com> Link: https://sashiko.dev/#/patchset/20260415163620.12957-1-tony.luck%40intel.com # [1] Reviewed-by: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20260421150216.11666-3-tony.luck@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-04-27 21:42:31 +02:00
Kai-Heng Feng	d7610855b0	ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler Add support for decoding NVIDIA-specific CPER sections delivered via the APEI GHES vendor record notifier chain. NVIDIA hardware generates vendor-specific CPER sections containing error signatures and diagnostic register dumps. This implementation registers a notifier_block with the GHES vendor record notifier and decodes these sections, printing error details via dev_info(). The driver binds to ACPI device NVDA2012, present on NVIDIA server platforms. The NVIDIA CPER section contains a fixed header with error metadata (signature, error type, severity, socket) followed by variable-length register address-value pairs for hardware diagnostics. This work is based on libcper [1]. Example output: nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544 nvidia-ghes NVDA2012:00: signature: CMET-INFO nvidia-ghes NVDA2012:00: error_type: 0 nvidia-ghes NVDA2012:00: error_instance: 0 nvidia-ghes NVDA2012:00: severity: 3 nvidia-ghes NVDA2012:00: socket: 0 nvidia-ghes NVDA2012:00: number_regs: 32 nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000 nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 value=0x0000000100000000 https://github.com/openbmc/libcper/commit/683e055061ce [1] Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20260330094203.38022-4-kaihengf@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-04-06 16:48:58 +02:00
Kai-Heng Feng	441fa10a5a	ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Add a device-managed wrapper around ghes_register_vendor_record_notifier() so drivers can avoid manual cleanup on device removal or probe failure. Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Link: https://patch.msgid.link/20260330094203.38022-2-kaihengf@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-04-06 16:48:58 +02:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Tony W Wang-oc	57d5287b7e	ACPI: APEI: GHES: Add ghes_edac support for __ZX__ and _BYO_ systems Let ghes_edac be the preferred driver to load on __ZX__ and _BYO_ systems by extending the platform detection list in ghes.c Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Tested-by: Lyle Li <LyleLi@zhaoxin.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260128025216.12564-1-TonyWWang-oc@zhaoxin.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-28 21:39:26 +01:00
Nathan Chancellor	b584bfbd7e	ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18 After a recent innocuous change to drivers/acpi/apei/ghes.c, building ARCH=arm64 allmodconfig with clang-17 or older (which has both CONFIG_KASAN=y and CONFIG_WERROR=y) fails with: drivers/acpi/apei/ghes.c:902:13: error: stack frame size (2768) exceeds limit (2048) in 'ghes_do_proc' [-Werror,-Wframe-larger-than] 902 \| static void ghes_do_proc(struct ghes *ghes, \| ^ A KASAN pass that removes unneeded stack instrumentation, enabled by default in clang-18 [1], drastically improves stack usage in this case. To avoid the warning in the common allmodconfig case when it can break the build, disable KASAN for ghes.o when compile testing with clang-17 and older. Disabling KASAN outright may hide legitimate runtime issues, so live with the warning in that case; the user can either increase the frame warning limit or disable -Werror, which they should probably do when debugging with KASAN anyways. Closes: https://github.com/ClangBuiltLinux/linux/issues/2148 Link: `51fbab1345` [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20260114-ghes-avoid-wflt-clang-older-than-18-v1-1-9c8248bfe4f4@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-28 21:35:30 +01:00
Fabio M. De Francesco	ba8af8e1f1	ACPI: APEI: GHES: Add helper to copy CPER CXL protocol error info to work struct Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent type and copy the CPER CXL protocol errors information to a work data structure. Export the new symbol for reuse by ELOG. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20260114101543.85926-5-fabio.m.de.francesco@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:09:34 +01:00
Fabio M. De Francesco	7020586968	ACPI: APEI: GHES: Add helper for CPER CXL protocol errors checks Move the CPER CXL protocol errors validity check out of cxl_cper_post_prot_err() to new cxl_cper_sec_prot_err_valid() and limit the serial number check only to CXL agents that are CXL devices (UEFI v2.10, Appendix N.2.13). Export the new symbol for reuse by ELOG. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20260114101543.85926-4-fabio.m.de.francesco@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:09:34 +01:00
Shuai Xue	b73cf7eaa6	ACPI: APEI: GHES: Improve ghes_notify_sea() status check Performance testing on ARMv8 systems shows significant overhead in error status handling in SEA error handling. - ghes_peek_estatus(): 8,138.3 ns (21,160 cycles). - ghes_clear_estatus(): 2,038.3 ns (5,300 cycles). Apply the same optimization used in ghes_notify_nmi() to ghes_notify_sea() by checking for active errors before processing, Tested-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20260112032239.30023-4-xueshuai@linux.alibaba.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:05:05 +01:00
Shuai Xue	feb2d38013	ACPI: APEI: GHES: Extract helper functions for error status handling Refactors the GHES driver by extracting common functionality into reusable helper functions: 1. ghes_has_active_errors() - Checks if any error sources in a given list have active errors 2. ghes_map_error_status() - Maps error status address to virtual address 3. ghes_unmap_error_status() - Unmaps error status virtual address 4. Use `guard(rcu)()` instead of explicit `rcu_read_lock()`/`rcu_read_unlock()`. These helpers eliminate code duplication in the NMI path and prepare for similar usage in the SEA path in a subsequent patch. No functional change intended. Tested-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20260112032239.30023-3-xueshuai@linux.alibaba.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:05:05 +01:00
Tony Luck	f2edc1fb9c	ACPI: APEI: GHES: Improve ghes_notify_nmi() status check ghes_notify_nmi() is called for every NMI and must check whether the NMI was generated because an error was signalled by platform firmware. This check is very expensive as for each registered GHES NMI source it reads from the acpi generic address attached to this error source to get the physical address of the acpi_hest_generic_status block. It then checks the "block_status" to see if an error was logged. The ACPI/APEI code must create virtual mappings for each of those physical addresses, and tear them down afterwards. On an Icelake system this takes around 15,000 TSC cycles. Enough to disturb efforts to profile system performance. If that were not bad enough, there are some atomic accesses in the code path that will cause cache line bounces between CPUs. A problem that gets worse as the core count increases. But BIOS changes neither the acpi generic address nor the physical address of the acpi_hest_generic_status block. So this walk can be done once when the NMI is registered to save the virtual address (unmapping if the NMI is ever unregistered). The "block_status" can be checked directly in the NMI handler. This can be done without any atomic accesses. Resulting time to check that there is not an error record is around 900 cycles. Reported-by: Andi Kleen <andi.kleen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20260112032239.30023-2-xueshuai@linux.alibaba.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:05:05 +01:00
Mauro Carvalho Chehab	fa2408a24f	APEI/GHES: ensure that won't go past CPER allocated record The logic at ghes_new() prevents allocating too large records, by checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB). Yet, the allocation is done with the actual number of pages from the CPER bios table location, which can be smaller. Yet, a bad firmware could send data with a different size, which might be bigger than the allocated memory, causing an OOPS: Unable to handle kernel paging request at virtual address fff00000f9b40000 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000 [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000 Internal error: Oops: 0000000096000007 [#1] SMP Modules linked in: CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022 Workqueue: kacpi_notify acpi_os_execute_deferred pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : hex_dump_to_buffer+0x30c/0x4a0 lr : hex_dump_to_buffer+0x328/0x4a0 sp : ffff800080e13880 x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083 x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004 x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083 x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010 x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020 x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008 x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020 x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000 x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008 Call trace: hex_dump_to_buffer+0x30c/0x4a0 (P) print_hex_dump+0xac/0x170 cper_estatus_print_section+0x90c/0x968 cper_estatus_print+0xf0/0x158 __ghes_print_estatus+0xa0/0x148 ghes_proc+0x1bc/0x220 ghes_notify_hed+0x5c/0xb8 notifier_call_chain+0x78/0x148 blocking_notifier_call_chain+0x4c/0x80 acpi_hed_notify+0x28/0x40 acpi_ev_notify_dispatch+0x50/0x80 acpi_os_execute_deferred+0x24/0x48 process_one_work+0x15c/0x3b0 worker_thread+0x2d0/0x400 kthread+0x148/0x228 ret_from_fork+0x10/0x20 Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44) ---[ end trace 0000000000000000 ]--- Prevent that by taking the actual allocated are into account when checking for CPER length. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [ rjw: Subject tweaks ] Link: https://patch.msgid.link/4e70310a816577fabf37d94ed36cde4ad62b1e0a.1767871950.git.mchehab+huawei@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:04:33 +01:00
Mauro Carvalho Chehab	87880af2d2	APEI/GHES: ARM processor Error: don't go past allocated memory If the BIOS generates a very small ARM Processor Error, or an incomplete one, the current logic will fail to deferrence err->section_length and ctx_info->size Add checks to avoid that. With such changes, such GHESv2 records won't cause OOPSes like this: [ 1.492129] Internal error: Oops: 0000000096000005 [#1] SMP [ 1.495449] Modules linked in: [ 1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT [ 1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022 [ 1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred [ 1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1.497199] pc : log_arm_hw_error+0x5c/0x200 [ 1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75). 70 err_info = (struct cper_arm_err_info )(err + 1); 71 ctx_info = (struct cper_arm_ctx_info )(err_info + err->err_info_num); 72 ctx_err = (u8 )ctx_info; 73 74 for (n = 0; n < err->context_info_num; n++) { 75 sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size; 76 ctx_info = (struct cper_arm_ctx_info )((long)ctx_info + sz); 77 ctx_len += sz; 78 } 79 and similar ones while trying to access section_length on an error dump with too small size. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [ rjw: Subject tweaks ] Link: https://patch.msgid.link/7fd9f38413be05ee2d7cfdb0dc31ea2274cf1a54.1767871950.git.mchehab+huawei@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-14 17:03:29 +01:00
Colin Ian King	cae444e0e2	ACPI: APEI: EINJ: make read-only array non_mmio_desc static const Don't populate the read-only array non_mmio_desc on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20251219215900.494211-1-colin.i.king@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-07 20:47:34 +01:00
Linus Torvalds	509d3f4584	Significant patch series in this pull request: - The 6 patch series "panic: sys_info: Refactor and fix a potential issue" from Andy Shevchenko fixes a build issue and does some cleanup in ib/sys_info.c. - The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from David Laight enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions. - The 2 patch series "scripts/gdb/symbols: make BPF debug info available to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line numbers available to the GDB debugger. - The 4 patch series "Enable hung_task and lockup cases to dump system info on demand" from Feng Tang adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire. - The 6 patch series "lib/base64: add generic encoder/decoder, migrate users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations. - The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric Dumazet makes TCP a little faster. - The 9 patch series "liveupdate: Rework KHO for in-kernel users" from Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients. - The 13 patch series "kho: simplify state machine and enable dynamic updates" from Pasha Tatashin increases the flexibility of KEXEC Handover. Also preparation for LUO. - The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is a major new feature targeted at cloud environments. Quoting the [0/N]: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - The 3 patch series "kexec: reorganize kexec and kdump sysfs" from Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day. - The 2 patch series "kho: fixes for vmalloc restoration" from Mike Rapoport fixes a BUG which was being hit during KHO restoration of vmalloc() regions. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52 n0pVpQmG5P/KyOmpLztn96ejL4heKwQ= =JY96 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...	2025-12-06 14:01:20 -08:00
Linus Torvalds	7203ca412f	Significant patch series in this merge are as follows: - The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from Uladzislau Rezki reworks the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT). - The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec. - The 4 patch series "mm/zswap: misc cleanup of code and documentations" from SeongJae Park does some light maintenance work on the zswap code. - The 5 patch series "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the /sys/kernel/debug/page_owner debug feature. It adds unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time. - The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua Hahn makes some minor alterations to the page allocator's per-cpu-pages feature. - The 2 patch series "Improve UFFDIO_MOVE scalability by removing anon_vma lock" from Lokesh Gidra addresses a scalability issue in userfaultfd's UFFDIO_MOVE operation. - The 2 patch series "kasan: cleanups for kasan_enabled() checks" from Sabyrzhan Tasbolatov performs some cleanup in the KASAN code. - The 2 patch series "drivers/base/node: fold node register and unregister functions" from Donet Tom cleans up the NUMA node handling code a little. - The 4 patch series "mm: some optimizations for prot numa" from Kefeng Wang provides some cleanups and small optimizations to the NUMA allocation hinting code. - The 5 patch series "mm/page_alloc: Batch callers of free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings. - The 2 patch series "optimize the logic for handling dirty file folios during reclaim" from Baolin Wang removes some now-unnecessary work from page reclaim. - The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning feature. - The 2 patch series "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration. - The 15 patch series "expand mmap_prepare functionality, port more users" from Lorenzo Stoakes enhances the new(ish) file_operations.mmap_prepare() method and ports additional callsites from the old ->mmap() over to ->mmap_prepare(). - The 8 patch series "Fix stale IOTLB entries for kernel address space" from Lu Baolu fixes a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry. - The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()" from Wei Yang cleans up and optimizes the folio splitting code. - The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui Song implements some cleanups and a minor fix in the swap discard code. - The 8 patch series "mm/damon: misc documentation fixups" from SeongJae Park does as advertised. - The 9 patch series "mm/damon: support pin-point targets removal" from SeongJae Park permits userspace to remove a specific monitoring target in the middle of the current targets list. - The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h" from Harry Yoo implements a couple of cleanups related to mm header file inclusion. - The 2 patch series "mm/swapfile.c: select swap devices of default priority round robin" from Baoquan He improves the selection of swap devices for NUMA machines. - The 3 patch series "mm: Convert memory block states (MEM_) macros to enums" from Israel Batista changes the memory block labels from macros to enums so they will appear in kernel debug info. - The 3 patch series "ksm: perform a range-walk to jump over holes in break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM unmerges an address range. - The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests" from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON userspace unit tests. - The 2 patch series "some cleanups for pageout()" from Baolin Wang cleans up a couple of minor things in the page scanner's writeback-for-eviction code. - The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file. - The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs. - The 2 patch series "mm: perform guard region install/remove under VMA lock" from Lorenzo Stoakes reduces mmap lock contention for callers performing VMA guard region operations. - The 2 patch series "vma_start_write_killable" from Matthew Wilcox starts work in permitting applications to be killed when they are waiting on a read_lock on the VMA lock. - The 11 patch series "mm/damon/tests: add more tests for online parameters commit" from SeongJae Park adds additional userspace testing of DAMON's "commit" feature. - The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does that. - The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another. - The 16 patch series "mm: support device-private THP" from Balbir Singh introduces support for Transparent Huge Page (THP) migration in zone device-private memory. - The 3 patch series "Optimize folio split in memory failure" from Zi Yan optimizes folio split operations in the memory failure code. - The 2 patch series "mm/huge_memory: Define split_type and consolidate split support checks" from Wei Yang provides some more cleanups in the folio splitting code. - The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" from Lorenzo Stoakes cleans up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t. - The 4 patch series "reparent the THP split queue" from Muchun Song reparents the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources. - The 3 patch series "unify PMD scan results and remove redundant cleanup" from Wei Yang does a little cleanup in the hugepage collapse code. - The 6 patch series "zram: introduce writeback bio batching" from Sergey Senozhatsky improves zram writeback efficiency by introducing batched bio writeback support. - The 4 patch series "memcg: cleanup the memcg stats interfaces" from Shakeel Butt cleans up our handling of the interrupt safety of some memcg stats. - The 4 patch series "make vmalloc gfp flags usage more apparent" from Vishal Moola cleans up vmalloc's handling of incoming GFP flags. - The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V" from Chunyan Zhang teches soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension. - The 5 patch series "mm: swap: small fixes and comment cleanups" from Youngjun Park fixes a small bug and cleans up some of the swap code. - The 4 patch series "initial work on making VMA flags a bitmap" from Lorenzo Stoakes starts work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit. - The 2 patch series "mm/swapfile: fix and cleanup swap list iterations" from Youngjun Park addresses a possible bug in the swap discard code and cleans things up a little. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4= =IUzS -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit `ebb9aeb980` ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...	2025-12-05 13:52:43 -08:00
Linus Torvalds	b1dd1e2f3e	EFI updates for v6.19: - Parse SMBIOS tables in memory directly on Macbooks that do not implement the EFI SMBIOS protocol - Obtain EDID information from the primary display while running in the EFI stub, and expose it via bootparams on x86 (generic method is in the works, and will likely land during the next cycle) - Bring CPER handling for ARM systems up to data with the latest EFI spec changes. - Various cosmetic changes. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCaS6z+QAKCRAwbglWLn0t XCs8AQCL2Ebzq/FzMB0DEzcqwp9GV6upRReqBIrZcFQuZ+8IcQD/V+N4u3h1m1nJ ofl4KQckZTPPV+iwDDUb7scn5fwgpA4= =2tk+ -----END PGP SIGNATURE----- Merge tag 'efi-next-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "The usual trickle of EFI contributions: - Parse SMBIOS tables in memory directly on Macbooks that do not implement the EFI SMBIOS protocol - Obtain EDID information from the primary display while running in the EFI stub, and expose it via bootparams on x86 (generic method is in the works, and will likely land during the next cycle) - Bring CPER handling for ARM systems up to data with the latest EFI spec changes - Various cosmetic changes" * tag 'efi-next-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: docs: efi: add CPER functions to driver-api efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs efi/cper: Add a new helper function to print bitmasks efi/cper: Adjust infopfx size to accept an extra space RAS: Report all ARM processor CPER information to userspace efi/libstub: x86: Store EDID in boot_params efi/libstub: gop: Add support for reading EDID efi/libstub: gop: Initialize screen_info in helper function efi/libstub: gop: Find GOP handle instead of GOP data efi: Fix trailing whitespace in header file efi/memattr: Convert efi_memattr_init() return type to void efi: stmm: fix kernel-doc "bad line" warnings efi/riscv: Remove the useless failure return message print efistub/x86: Add fallback for SMBIOS record lookup	2025-12-04 17:10:08 -08:00
Breno Leitao	3fa805c37d	vmcoreinfo: track and log recoverable hardware errors Introduce a generic infrastructure for tracking recoverable hardware errors (HW errors that are visible to the OS but does not cause a panic) and record them for vmcore consumption. This aids post-mortem crash analysis tools by preserving a count and timestamp for the last occurrence of such errors. On the other side, correctable errors, which the OS typically remains unaware of because the underlying hardware handles them transparently, are less relevant for crash dump and therefore are NOT tracked in this infrastructure. Add centralized logging for sources of recoverable hardware errors based on the subsystem it has been notified. hwerror_data is write-only at kernel runtime, and it is meant to be read from vmcore using tools like crash/drgn. For example, this is how it looks like when opening the crashdump from drgn. >>> prog['hwerror_data'] (struct hwerror_info[1]){ { .count = (int)844, .timestamp = (time64_t)1752852018, }, ... This helps fleet operators quickly triage whether a crash may be influenced by hardware recoverable errors (which executes a uncommon code path in the kernel), especially when recoverable errors occurred shortly before a panic, such as the bug fixed by commit `ee62ce7a1d` ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") This is not intended to replace full hardware diagnostics but provides a fast way to correlate hardware events with kernel panics quickly. Rare machine check exceptions—like those indicated by mce_flags.p5 or mce_flags.winchip—are not accounted for in this method, as they fall outside the intended usage scope for this feature's user base. [leitao@debian.org: add hw-recoverable-errors to toctree] Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Tony Luck <tony.luck@intel.com> Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI] Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Bob Moore <robert.moore@intel.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Len Brown <lenb@kernel.org> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:44 -08:00
Mauro Carvalho Chehab	96b010536e	efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor was defined simply as: Type at byte offset 4: - Cache error - TLB Error - Bus Error - Micro-architectural Error All other values are reserved Yet, there was no information about how this would be encoded. Spec 2.9A errata corrected it by defining: - Bit 1 - Cache Error - Bit 2 - TLB Error - Bit 3 - Bus Error - Bit 4 - Micro-architectural Error All other values are reserved That actually aligns with the values already defined on older versions at N.2.4.1. Generic Processor Error Section. Spec 2.10 also preserve the same encoding as 2.9A. Adjust CPER and GHES handling code for both generic and ARM processors to properly handle UEFI 2.9A and 2.10 encoding. Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2025-11-21 09:42:03 +01:00
Jason Tian	05954511b7	RAS: Report all ARM processor CPER information to userspace The ARM processor CPER record was added in UEFI v2.6 and remained unchanged up to v2.10. Yet, the original arm_event trace code added by `e9279e83ad` ("trace, ras: add ARM processor error trace event") is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not exporting any information from tables N.17 to N.29 of the record. This is not enough for the user to be able to figure out what has exactly happened or to take appropriate action. According to the UEFI v2.9 specification chapter N2.4.4, the ARM processor error section includes: - several (ERR_INFO_NUM) ARM processor error information structures (Tables N.17 to N.20); - several (CONTEXT_INFO_NUM) ARM processor context information structures (Tables N.21 to N.29); - several vendor specific error information structures. The size is given by Section Length minus the size of the other fields. In addition, it also exports two fields that are parsed by the GHES driver when firmware reports it, e.g.: - error severity - CPU logical index Report all of these information to userspace via a the ARM tracepoint so that userspace can properly record the error and take decisions related to CPU core isolation according to error severity and other info. The updated ARM trace event now contains the following fields: ====================================== ============================= UEFI field on table N.16 ARM Processor trace fields ====================================== ============================= Validation handled when filling data for affinity MPIDR and running state. ERR_INFO_NUM pei_len CONTEXT_INFO_NUM ctx_len Section Length indirectly reported by pei_len, ctx_len and oem_len Error affinity level affinity MPIDR_EL1 mpidr MIDR_EL1 midr Running State running_state PSCI State psci_state Processor Error Information Structure pei_err - count at pei_len Processor Context ctx_err- count at ctx_len Vendor Specific Error Info oem - count at oem_len ====================================== ============================= It should be noted that decoding of tables N.17 to N.29, if needed, will be handled in userspace. That gives more flexibility, as there won't be any need to flood the kernel with micro-architecture specific error decoding. Also, decoding the other fields require a complex logic, and should be done for each of the several values inside the record field. So, let userspace daemons like rasdaemon decode them, parsing such tables and having vendor-specific micro-architecture-specific decoders. [mchehab: modified description, solved merge conflicts and fixed coding style] Signed-off-by: Jason Tian <jason@os.amperecomputing.com> Co-developed-by: Shengwei Luo <luoshengwei@huawei.com> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com> # rebased Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Fixes: `e9279e83ad` ("trace, ras: add ARM processor error trace event") Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2025-11-21 09:42:02 +01:00
Tony Luck	d2932a59c2	ACPI: APEI: EINJ: Fix EINJV2 initialization and injection ACPI 6.6 specification for EINJV2 appends an extra structure to the end of the existing struct set_error_type_with_address. Several issues showed up in testing. 1) Initialization was broken by an earlier fix [1] since is_v2 is only set while performing an injection, not during initialization. 2) A buggy BIOS provided invalid "revision" and "length" for the extension structure. Add several sanity checks. 3) When injecting legacy error types on an EINJV2 capable system, don't copy the component arrays. Fixes: `6c70585149` ("ACPI: APEI: EINJ: Check if user asked for EINJV2 injection") # [1] Fixes: `b47610296d` ("ACPI: APEI: EINJ: Enable EINJv2 error injections") Signed-off-by: Tony Luck <tony.luck@intel.com> [ rjw: Changelog edits ] Cc: 6.17+ <stable@vger.kernel.org> # 6.17+ Link: https://patch.msgid.link/20251119012712.178715-1-tony.luck@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-19 13:36:29 +01:00
Ankit Agrawal	30d0a12910	mm: change ghes code to allow poison of non-struct pfn Poison (or ECC) errors can be very common on a large size cluster. The kernel MM currently handles ECC errors / poison only on memory page backed by struct page. The handling is currently missing for the PFNMAP memory that does not have struct pages. The series adds such support. Implement a new ECC handling for memory without struct pages. Kernel MM expose registration APIs to allow modules that are managing the device to register its device memory region. MM then tracks such regions using interval tree. The mechanism is largely similar to that of ECC on pfn with struct pages. If there is an ECC error on a pfn, all the mapping to it are identified and a SIGBUS is sent to the user space processes owning those mappings. Note that there is one primary difference versus the handling of the poison on struct pages, which is to skip unmapping to the faulty PFN. This is done to handle the huge PFNMAP support added recently [1] that enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN mapped in such as way would need breaking the PMD/PUD mapping into PTEs that will get mirrored into the S2. This can greatly increase the cost of table walks and have a major performance impact. nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using remap_pfn_range without being added to the kernel [2]. These device memory PFNs are not backed by struct page. So make nvgrace-gpu-vfio-pci module make use of the mechanism to get poison handling support on the device memory. This patch (of 3): The GHES code allows calling of memory_failure() on the PFNs that pass the pfn_valid() check. This contract is broken for the remapped PFNs which fails the check and ghes_do_memory_failure() returns without triggering memory_failure(). Update code to allow memory_failure() call on PFNs failing pfn_valid(). Link: https://lkml.kernel.org/r/20251102184434.2406-1-ankita@nvidia.com Link: https://lkml.kernel.org/r/20251102184434.2406-2-ankita@nvidia.com Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Aniket Agashe <aniketa@nvidia.com> Cc: Ankit Agrawal <ankita@nvidia.com> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neo Jia <cjia@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tarun Gupta <targupta@nvidia.com> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: Vikram Sethi <vsethi@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-16 17:28:29 -08:00
pengdonglin	c7bc7e9070	ACPI: APEI: Remove redundant rcu_read_lock/unlock() under spinlock Since commit `a8bb74acd8` ("rcu: Consolidate RCU-sched update-side function definitions") there is no difference between rcu_read_lock(), rcu_read_lock_bh() and rcu_read_lock_sched() in terms of RCU read section and the relevant grace period. That means that spin_lock(), which implies rcu_read_lock_sched(), also implies rcu_read_lock(). There is no need no explicitly start a RCU read section if one has already been started implicitly by spin_lock(). Simplify the code and remove the inner rcu_read_lock() invocation. Signed-off-by: pengdonglin <pengdonglin@xiaomi.com> Signed-off-by: pengdonglin <dolinux.peng@gmail.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250916044735.2316171-2-dolinux.peng@gmail.com [ rjw: Subject adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-28 12:31:54 +02:00
Jiaqi Yan	7d444f5099	ACPI: APEI: EINJ: Allow more types of addresses except MMIO EINJ driver today only allows injection request to go through for two kinds of IORESOURCE_MEM: IORES_DESC_PERSISTENT_MEMORY and IORES_DESC_SOFT_RESERVED. This check prevents user of EINJ to test memory corrupted in many interesting areas: - Legacy persistent memory - Memory claimed to be used by ACPI tables or NV storage - Kernel crash memory and others There is need to test how kernel behaves when something consumes memory errors in these memory regions. For example, if certain ACPI table is corrupted, does kernel crash gracefully to prevent "silent data corruption". For another example, legacy persistent memory, when managed by Device DAX, does support recovering from Machine Check Exception raised by memory failure, hence worth to be tested. However, attempt to inject memory error via EINJ to legacy persistent memory or ACPI owned memory fails with -EINVAL. Allow EINJ to inject at address except it is MMIO. Leave it to the BIOS or firmware to decide what is a legitimate injection target. In addition to the test done in [1], on a machine having the following iomem resources: ... 01000000-08ffffff : Crash kernel 768f0098-768f00a7 : APEI EINJ ... 768f4000-77323fff : ACPI Non-volatile Storage 77324000-777fefff : ACPI Tables 777ff000-777fffff : System RAM 77800000-7fffffff : Reserved 80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff] 90040000-957fffff : PCI Bus 0000:00 ... 300000000-3ffffffff : Persistent Memory (legacy) ... I commented __einj_error_inject during the test and just tested when injecting a memory error at each start address shown above: - 0x80000000 and 0x90040000 both failed with EINVAL - request passed through for all other addresses Link: https://lore.kernel.org/linux-acpi/20250825223348.3780279-1-jiaqiyan@google.com [1] Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250910044531.264043-1-jiaqiyan@google.com [ rjw: Adding a link tag for the [1] reference ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-15 21:32:29 +02:00
Thorsten Blum	e06722a9df	ACPI: APEI: Remove redundant assignments in erst_dbg_{ioctl\|write}() Use the result of copy_from_user() directly instead of assigning it to the local variable 'rc' and then overwriting it in erst_dbg_write() or immediately returning from erst_dbg_ioctl(). No intentional functional changes. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Link: https://patch.msgid.link/20250903224913.242928-2-thorsten.blum@linux.dev [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-15 21:21:57 +02:00
Uwe Kleine-König	b21d1fbb97	ACPI: APEI: EINJ: Fix resource leak by remove callback in .exit.text The .remove() callback is also used during error handling in faux_probe(). As einj_remove() was marked with __exit it's not linked into the kernel if the driver is built-in, potentially resulting in resource leaks. Also remove the comment justifying the __exit annotation which doesn't apply any more since the driver was converted to the faux device interface. Fixes: `6cb9441bfe` ("ACPI: APEI: EINJ: Transition to the faux device interface") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Link: https://patch.msgid.link/20250814051157.35867-2-u.kleine-koenig@baylibre.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-08-18 18:18:33 +02:00
Charles Han	7459e87ae1	ACPI: APEI: EINJ: fix potential NULL dereference in __einj_error_inject() The __einj_error_inject() function allocates memory via kmalloc() without checking for allocation failure, which could lead to a NULL pointer dereference. Return -ENOMEM in case allocation fails. Fixes: `b47610296d` ("ACPI: APEI: EINJ: Enable EINJv2 error injections") Signed-off-by: Charles Han <hanchunchao@inspur.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250815024207.3038-1-hanchunchao@inspur.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-08-18 18:10:15 +02:00
Tony Luck	6c70585149	ACPI: APEI: EINJ: Check if user asked for EINJV2 injection On an EINJV2 capable system, users may still use the old injection interface but einj_get_parameter_address() takes the EINJV2 path to map the parameter structure. This results in the address the user supplied being stored to the wrong location and the BIOS injecting based on an uninitialized field (0x0 in the reported case). Check the version of the request when mapping the EINJ parameter structure in BIOS reserved memory. Fixes: `691a0f0a55` ("ACPI: APEI: EINJ: Discover EINJv2 parameters") Reported-by: Lai, Yi1 <yi1.lai@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Zaid Alali <zaidal@os.amperecomputing.com> Reviewed-by: Hanjun Guo <gouhanjun@huawei.com> Link: https://patch.msgid.link/20250814161706.4489-1-tony.luck@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-08-18 18:07:38 +02:00
Shuai Xue	c1f1fda141	ACPI: APEI: handle synchronous exceptions in task work The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous errors use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, which: - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit `a70297d221` ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed a poisoned page. This is done by sending a SIGBUS signal with error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who has the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not factually correct. After this change: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Since the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-16 21:08:04 +02:00
Shuai Xue	79a5ae3c4c	ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered If a synchronous error is detected as a result of user-space process triggering a 2-bit uncorrected error, the CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a memory_failure() work which poisons the related page, unmaps the page, and then sends a SIGBUS to the process, so that a system wide panic can be avoided. However, no memory_failure() work will be queued when abnormal synchronous errors occur. These errors can include situations like invalid PA, unexpected severity, no memory failure config support, invalid GUID section, etc. In such a case, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot. Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250714114212.31660-2-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-16 21:08:04 +02:00
Tony Luck	c8aea83c73	ACPI: APEI: EINJ: Fix trigger actions The trigger events are in BIOS memory immediately following the acpi_einj_trigger structure. These were not copied to regular kernel memory for use by apei_exec_ctx_init() so injections in "notrigger=0" mode failed with a message like this: APEI: Invalid action table, unknown instruction type: 123 Fix by allocating a "table_size" block of memory and copying the whole table for use in the rest of the trigger flow. Fixes: `1a35c88302` ("ACPI: APEI: EINJ: Fix kernel test sparse warnings") Reported-by: Yi1 Lai <yi1.lai@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250703200421.28012-1-tony.luck@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-07 18:23:17 +02:00
Breno Leitao	4734c8b46b	ACPI: APEI: GHES: add TAINT_MACHINE_CHECK on GHES panic path When a GHES (Generic Hardware Error Source) triggers a panic, add the TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the kernel as tainted due to a machine check event, improving diagnostics and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to indicate lockdep remains valid. At large scale deployment, this helps to quickly determine panics that are coming due to hardware failures. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250702-add_tain-v1-1-9187b10914b9@debian.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-03 15:23:14 +02:00
Colin Ian King	0fd0541b67	ACPI: APEI: EINJ: Fix check and iounmap of uninitialized pointer p In the case where a request_mem_region call fails and pointer r is null the error exit path via label 'out' will check for a non-null pointer p and try to iounmap it. However, pointer p has not been assigned a value at this point, so it may potentially contain any garbage value. Fix this by ensuring pointer p is initialized to NULL. Fixes: `1a35c88302` ("ACPI: APEI: EINJ: Fix kernel test sparse warnings") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250624202937.523013-1-colin.i.king@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-26 20:52:26 +02:00
Colin Ian King	c13d38bc9b	ACPI: APEI: EINJ: Fix less than zero comparison on a size_t variable The check for c < 0 is always false because variable c is a size_t which is not a signed type. Fix this by making c a ssize_t. Fixes: `90711f7bdf` ("ACPI: APEI: EINJ: Create debugfs files to enter device id and syndrome") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250624201032.522168-1-colin.i.king@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-26 20:49:37 +02:00
Dan Carpenter	80744a3bed	ACPI: APEI: EINJ: prevent memory corruption in error_type_set() The "einj_buf" buffer is 32 chars. If "count" is larger than that it results in memory corruption. Cap it at 31 so that we leave the last character as a NUL terminator. By the way, the highest reasonable value for "count" is 24. Fixes: `0c6176e1e1` ("ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/ae6286cf-4d73-4b97-8c0f-0782a65b8f51@sabinyo.mountain Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-26 20:46:13 +02:00
Zaid Alali	b47610296d	ACPI: APEI: EINJ: Enable EINJv2 error injections Enable injection using EINJv2 mode of operation. [Tony: Mostly Zaid's original code. I just changed how the error ID and syndrome bits are implemented. Also swapped out some camelcase variable names] Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-7-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:32 +02:00
Tony Luck	90711f7bdf	ACPI: APEI: EINJ: Create debugfs files to enter device id and syndrome EINJv2 allows users to inject multiple errors at the same time by specifying the device id and syndrome bits for each error in a flex array. Create files in the einj debugfs directory to enter data for each device id and syndrome value. Note that the specification says these are 128-bit little-endian values. Linux doesn't have a handy helper to manage objects of this type. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-6-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:32 +02:00
Zaid Alali	691a0f0a55	ACPI: APEI: EINJ: Discover EINJv2 parameters The EINJv2 set_error_type_with_address structure has a flex array to hold the component IDs and syndrome values used when injecting multiple errors at once. Discover the size of this array by taking the address from the ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS entry in the EINJ table and reading the BIOS copy of the structure. Derive the maximum number of components from the length field in the einjv2_extension_struct at the end of the BIOS copy. Map the whole of the structure into kernel memory (and unmap on module unload). [Tony: Code unchanged from Zaid's original. New commit message] Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-5-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:31 +02:00
Zaid Alali	21cd921b1a	ACPI: APEI: EINJ: Add einjv2 extension struct Add einjv2 extension struct and EINJv2 error types to prepare the driver for EINJv2 support. ACPI specifications[1] enables EINJv2 by extending set_error_type_with_address struct. Link: https://uefi.org/specs/ACPI/6.6/18_Platform_Error_Interfaces.html#einjv2-extension-structure [1] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-4-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:31 +02:00
Zaid Alali	0c6176e1e1	ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities Enable the driver to show all supported error injections for EINJ and EINJv2 at the same time. EINJv2 capabilities can be discovered by checking the return value of get_error_type, where bit 30 set indicates EINJv2 support. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-3-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:31 +02:00
Zaid Alali	1a35c88302	ACPI: APEI: EINJ: Fix kernel test sparse warnings This patch fixes the kernel test robot warning reported here: Link: https://lore.kernel.org/all/202410241620.oApALow5-lkp@intel.com/ Use pointers annotated with the __iomem marker for all iomem map calls, and creates a local copy of the mapped IO memory for future access in the code. memcpy_fromio() and memcpy_toio() are used to read/write data from/to mapped IO memory. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250617193026.637510-2-zaidal@os.amperecomputing.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 20:49:31 +02:00
Dan Williams	162457f585	ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then cxl_core.ko fails to load. Prior to the faux_device_create() conversion, einj_probe() failures were tracked by the einj_initialized flag without failing einj_init(). Revert to that behavior and always succeed einj_init() given there is no way, and no pressing need, to discern faux device-create vs device-probe failures. This situation arose because CXL knows proper kernel named objects to trigger errors against, but acpi-einj knows how to perform the error injection. The injection mechanism is shared with non-CXL use cases. The result is CXL now has a module dependency on einj-core.ko, and init/probe failures are handled at runtime. Fixes: `6cb9441bfe` ("ACPI: APEI: EINJ: Transition to the faux device interface") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250607033228.1475625-4-dan.j.williams@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-10 19:23:25 +02:00
Ingo Molnar	41cb08555c	treewide, timers: Rename from_timer() to timer_container_of() Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com	2025-06-08 09:07:37 +02:00
Linus Torvalds	1fbbb62945	ACPI fixes for 6.16-rc1 - Unbreak acpi_ut_safe_strncpy() by restoring its previous behavior changed incorrectly by a recent update (Ahmed Salem). - Make a new static checker warning in the recently introduced ACPI MRRM table parser go away (Dan Carpenter). - Fix ACPI table referece leak in error path of einj_probe() (Dan Carpenter). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmg588ESHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1GcEH/icD26Lx5BJGrA2ulYpESAMJRyxyDQ8Q zsNpFr7dLQGG8C+XfIp77ktQobUpHadUVV5zNOPwHSdRICNoqKk+aiPzAy4keNVx XgiyNq5lAw1LOSo/9IrCdNzOvcGzRTya2oABE6XXjMakomVJ7urZIKrRVvrgANjh 62bx0SwbosAdHLYpoVPZlefEM+itYQeloREMvgkxIPNTx+YzNtUFA2FAvdrQ4VdA YCFtMkOP/SwnwKp0laHGqTfPIuaro/WQA2sXfi91zFk2PWVN/mc5qrcy3mOvZ9mD Y8GjA3TL/9Th/qcCWWRk/ByIGzxSkqow1CL1IWrQK6J3pl5KwcTZJxY= =xsjw -----END PGP SIGNATURE----- Merge tag 'acpi-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These address issues introduced by recent ACPI changes merged previously: - Unbreak acpi_ut_safe_strncpy() by restoring its previous behavior changed incorrectly by a recent update (Ahmed Salem) - Make a new static checker warning in the recently introduced ACPI MRRM table parser go away (Dan Carpenter) - Fix ACPI table referece leak in error path of einj_probe() (Dan Carpenter)" * tag 'acpi-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Switch back to using strncpy() in acpi_ut_safe_strncpy() ACPI: MRRM: Silence error code static checker warning ACPI: APEI: EINJ: Clean up on error in einj_probe()	2025-05-30 12:11:46 -07:00
Linus Torvalds	47cf96fbe3	arm64 updates for 6.16 ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree. - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls. - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4. - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM. - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code. Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected. Memory management: - Prevent BSS exports from being used by the early PI code. - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries. - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible. - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation. - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map. Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI. - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers. Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support. - Fix default setting of the OUTPUT variable so that tests are installed in the right location. vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function. Miscellaneous: - Add some missing header inclusions to the CCA headers. - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical). - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmg1uTgQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNFv2CAC9S5OW0btOAo7V/LFBpLhJM3hdIV6Sn6N1 d/K5znuqPBG6VPfBrshaZltEl/C3U8KG4H8xrlX5cSo7CRuf3DgVBw3kiZ6ERZj6 1gnKR54juA1oWhcroPl0s76ETWj3N4gO036u2qOhWNAYflDunh1+bCIGJkG4H/yP wqtWn974YUbad/zQJSbG3IMO1yvxZ/PsNpVF8HjyQ0/ZPWsYTscrhNQ0hWro17sR CTcUaGxH4GrXW24EGNgkLB9aq67X2rtGGtaIlp5oFl8FuLklc7TYbPwJp8cPCTNm 0Sp0mpuR9M675pYIKoCI9m5twc46znRIKmbXi5LvPd77418y3jTf =03N4 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...	2025-05-28 14:55:35 -07:00
Dan Carpenter	ce57cc1269	ACPI: APEI: EINJ: Clean up on error in einj_probe() Call acpi_put_table() before returning the error code. Fixes: `e54b1dc1c4` ("ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aDVRBok33LZhXcId@stanley.mountain Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-05-27 15:19:36 +02:00
Zaid Alali	e54b1dc1c4	ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type() A single call to einj_get_available_error_type() in init function is sufficient to save the return value in a global variable to be used later in various places in the code. This change has no functional impact, but only removes unnecessary redundant function calls. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250506213814.2365788-5-zaidal@os.amperecomputing.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-05-09 21:24:28 +02:00
Huang Yiwei	59529bbe64	firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES SDEI usually initialize with the ACPI table, but on platforms where ACPI is not used, the SDEI feature can still be used to handle specific firmware calls or other customized purposes. Therefore, it is not necessary for ARM_SDE_INTERFACE to depend on ACPI_APEI_GHES. In commit `dc4e8c07e9` ("ACPI: APEI: explicit init of HEST and GHES in acpi_init()"), to make APEI ready earlier, sdei_init was moved into acpi_ghes_init instead of being a standalone initcall, adding ACPI_APEI_GHES dependency to ARM_SDE_INTERFACE. This restricts the flexibility and usability of SDEI. This patch corrects the dependency in Kconfig and splits sdei_init() into two separate functions: sdei_init() and acpi_sdei_init(). sdei_init() will be called by arch_initcall and will only initialize the platform driver, while acpi_sdei_init() will initialize the device from acpi_ghes_init() when ACPI is ready. This allows the initialization of SDEI without ACPI_APEI_GHES enabled. Fixes: `dc4e8c07e9` ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Cc: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507045757.2658795-1-quic_hyiwei@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-08 13:35:22 +01:00

1 2 3 4 5 ...

438 Commits