From d64b0372760e09de6d18a0616d7bc652c8c6891d Mon Sep 17 00:00:00 2001 From: George Guo Date: Sat, 9 May 2026 10:44:15 +0800 Subject: [PATCH 1/2] kho: fix KHO_TREE_MAX_DEPTH for non-4KB page sizes KHO_TREE_MAX_DEPTH is calculated as: DIV_ROUND_UP(KHO_ORDER_0_LOG2 - KHO_BITMAP_SIZE_LOG2, KHO_TABLE_SIZE_LOG2) + 1 For systems with 16KB pages (e.g. arm64 with CONFIG_ARM64_16K_PAGES=y or LoongArch), this gives a depth of 4. Since levels are 0 based, with depth = 4 the effective top level is 3 and the top-level shift at bit 39. PAGE_SHIFT = 14 KHO_BITMAP_SIZE_LOG2 = PAGE_SHIFT + 3 = 17 KHO_TABLE_SIZE_LOG2 = log(2; (1 << PAGE_SHIFT) / 8) = 11 shift = ((3 - 1) * KHO_TABLE_SIZE_LOG2) + KHO_BITMAP_SIZE_LOG2 = 39 The order-0 bit sits at bit 50 (KHO_ORDER_0_LOG2 = 64 - PAGE_SHIFT = 50). When inserting or reading a key, the index extracted at the top level is: (1 << 50) >> 39 = 2048 2048 is exactly the table size (PAGE_SIZE / sizeof(phys_addr_t) = 2048 for 16KB pages), so it wraps to 0, aliasing the order bit to index 0 and losing it silently. On the second kernel, kho_radix_decode_key() sees a key without the order bit, calls fls64() on the wrong bit, computes a wrong order and thus a garbage physical address. phys_to_page() of that address faults in kho_preserved_memory_reserve(), causing a kernel panic early in boot. Fix by adding +1 to the DIV_ROUND_UP numerator so the formula accounts for the order bit itself, giving depth 5 for 16KB pages. The top-level shift becomes 50, and (1 << 50) >> 50 = 1, which is nonzero and unambiguous. For 4KB and 64KB page sizes the depth is unchanged. Link: https://patch.msgid.link/20260509024415.33190-1-dongtai.guo@linux.dev Fixes: 3f2ad90060f6 ("kho: adopt radix tree for preserved memory tracking") Tested-by: Kexin Liu Signed-off-by: George Guo Reviewed-by: Pasha Tatashin [rppt: added actual math to the changelog] Signed-off-by: Mike Rapoport (Microsoft) --- include/linux/kho/abi/kexec_handover.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi/kexec_handover.h index 7e847a2339b0..db9bda6dd310 100644 --- a/include/linux/kho/abi/kexec_handover.h +++ b/include/linux/kho/abi/kexec_handover.h @@ -274,7 +274,7 @@ enum kho_radix_consts { * and 1 bitmap level. */ KHO_TREE_MAX_DEPTH = - DIV_ROUND_UP(KHO_ORDER_0_LOG2 - KHO_BITMAP_SIZE_LOG2, + DIV_ROUND_UP(KHO_ORDER_0_LOG2 - KHO_BITMAP_SIZE_LOG2 + 1, KHO_TABLE_SIZE_LOG2) + 1, }; From 8fd2f26fa2a33cfe8ac043f976137ecf5b567f03 Mon Sep 17 00:00:00 2001 From: "Pratyush Yadav (Google)" Date: Tue, 19 May 2026 15:33:30 +0200 Subject: [PATCH 2/2] kho: fix order calculation for kho_unpreserve_pages() Commit 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") made sure preservations from kho_preserve_pages() do not span multiple NUMA nodes. If they do, the order is reduced and tried again. The same logic was not implemented for kho_unpreserve_pages(). This can result in unpreserve calculating a different order than preserve, and thus not actually unpreserving the pages. Fix this by moving the order calculation logic to __kho_preserve_pages_order() and use it from both preserve and unpreserve paths. Move __kho_unpreserve() down to avoid having a forward declaration. Its users are further down in the file anyway. Also, it results in grouping for all the page-level preservation and unpreservation functions. This unfortunately makes the diff hard to read, but the main change in __kho_unpreserve() is to call __kho_preserve_pages_order() instead of open-coding the order calculation. Fixes: 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) Reviewed-by: Samiullah Khawaja Reviewed-by: Pasha Tatashin Link: https://patch.msgid.link/20260519133332.2498092-1-pratyush@kernel.org Signed-off-by: Mike Rapoport (Microsoft) --- kernel/liveupdate/kexec_handover.c | 56 +++++++++++++++++------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c index 2592f7ca16e2..1b592d86dc48 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -357,20 +357,6 @@ int kho_radix_walk_tree(struct kho_radix_tree *tree, } EXPORT_SYMBOL_GPL(kho_radix_walk_tree); -static void __kho_unpreserve(struct kho_radix_tree *tree, - unsigned long pfn, unsigned long end_pfn) -{ - unsigned int order; - - while (pfn < end_pfn) { - order = min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); - - kho_radix_del_page(tree, pfn, order); - - pfn += 1 << order; - } -} - /* For physically contiguous 0-order pages. */ static void kho_init_pages(struct page *page, unsigned long nr_pages) { @@ -860,6 +846,37 @@ void kho_unpreserve_folio(struct folio *folio) } EXPORT_SYMBOL_GPL(kho_unpreserve_folio); +static unsigned int __kho_preserve_pages_order(unsigned long start_pfn, + unsigned long end_pfn) +{ + unsigned int order = min(count_trailing_zeros(start_pfn), + ilog2(end_pfn - start_pfn)); + + /* + * Make sure all the pages in a single preservation are in the same NUMA + * node. The restore machinery can not cope with a preservation spanning + * multiple NUMA nodes. + */ + while (pfn_to_nid(start_pfn) != pfn_to_nid(start_pfn + (1UL << order) - 1)) + order--; + + return order; +} + +static void __kho_unpreserve(struct kho_radix_tree *tree, + unsigned long pfn, unsigned long end_pfn) +{ + unsigned int order; + + while (pfn < end_pfn) { + order = __kho_preserve_pages_order(pfn, end_pfn); + + kho_radix_del_page(tree, pfn, order); + + pfn += 1 << order; + } +} + /** * kho_preserve_pages - preserve contiguous pages across kexec * @page: first page in the list. @@ -885,16 +902,7 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages) } while (pfn < end_pfn) { - unsigned int order = - min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); - - /* - * Make sure all the pages in a single preservation are in the - * same NUMA node. The restore machinery can not cope with a - * preservation spanning multiple NUMA nodes. - */ - while (pfn_to_nid(pfn) != pfn_to_nid(pfn + (1UL << order) - 1)) - order--; + unsigned int order = __kho_preserve_pages_order(pfn, end_pfn); err = kho_radix_add_page(tree, pfn, order); if (err) {