linux/mm
Roman Gushchin 66edc3a589 mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
Syzbot reported a bad page state problem caused by a page being freed
using free_page() still having a mlocked flag at free_pages_prepare()
stage:

  BUG: Bad page state in process syz.5.504  pfn:61f45
  page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45
  flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff)
  raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
  page_owner tracks the page as allocated
  page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
   kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
   kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
   kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline]
   kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530
   __do_compat_sys_ioctl fs/ioctl.c:1007 [inline]
   __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  page last free pid 8399 tgid 8399 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686
   folios_put_refs+0x76c/0x860 mm/swap.c:1007
   free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335
   __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
   tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
   tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
   tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373
   tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465
   exit_mmap+0x496/0xc40 mm/mmap.c:1926
   __mmput+0x115/0x390 kernel/fork.c:1348
   exit_mm+0x220/0x310 kernel/exit.c:571
   do_exit+0x9b2/0x28e0 kernel/exit.c:926
   do_group_exit+0x207/0x2c0 kernel/exit.c:1088
   __do_sys_exit_group kernel/exit.c:1099 [inline]
   __se_sys_exit_group kernel/exit.c:1097 [inline]
   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
   x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  Modules linked in:
  CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
   bad_page+0x176/0x1d0 mm/page_alloc.c:501
   free_page_is_bad mm/page_alloc.c:918 [inline]
   free_pages_prepare mm/page_alloc.c:1100 [inline]
   free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638
   kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline]
   kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386
   kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143
   __fput+0x23f/0x880 fs/file_table.c:431
   task_work_run+0x24f/0x310 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xa2f/0x28e0 kernel/exit.c:939
   do_group_exit+0x207/0x2c0 kernel/exit.c:1088
   __do_sys_exit_group kernel/exit.c:1099 [inline]
   __se_sys_exit_group kernel/exit.c:1097 [inline]
   __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
   ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  RIP: 0023:0xf745d579
  Code: Unable to access opcode bytes at 0xf745d54f.
  RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4
  RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

The problem was originally introduced by commit b109b87050 ("mm/munlock:
replace clear_page_mlock() by final clearance"): it was focused on
handling pagecache and anonymous memory and wasn't suitable for lower
level get_page()/free_page() API's used for example by KVM, as with this
reproducer.

Fix it by moving the mlocked flag clearance down to free_page_prepare().

The bug itself if fairly old and harmless (aside from generating these
warnings), aside from a small memory leak - "bad" pages are stopped from
being allocated again.

Link: https://lkml.kernel.org/r/20241106195354.270757-1-roman.gushchin@linux.dev
Fixes: b109b87050 ("mm/munlock: replace clear_page_mlock() by final clearance")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reported-by: syzbot+e985d3026c4fd041578e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 17:20:23 -08:00
..
damon mm/damon/core: avoid overflow in damon_feed_loop_next_input() 2024-11-07 14:14:59 -08:00
kasan kasan: remove vmalloc_percpu test 2024-10-30 20:14:11 -07:00
kfence mm: kfence: fix elapsed time for allocated/freed track 2024-09-26 14:01:44 -07:00
kmsan kmsan: do not pass NULL pointers as 0 2024-07-03 19:30:26 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.c mm/cma: add cma_{alloc,free}_folio() 2024-09-03 21:15:36 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() 2024-09-03 21:15:47 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
debug.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
dmapool.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c mm/filemap: fix filemap_get_folios_contig THP panic 2024-09-26 14:01:43 -07:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup_test.c
gup_test.h
gup.c mm/gup: stop leaking pinned pages in low memory conditions 2024-10-30 20:14:10 -07:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak 2024-09-26 14:01:43 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: unconditionally close VMAs on error 2024-11-05 16:49:55 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig resource: remove dependency on SPARSEMEM from GET_FREE_REGION 2024-10-28 21:40:39 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c mm: khugepaged: fix the incorrect statistics when collapsing large file folios 2024-10-17 00:28:10 -07:00
kmemleak.c mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space 2024-09-03 21:15:38 -07:00
ksm.c mm: remove PageSwapCache 2024-09-03 21:15:44 -07:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c
memblock.c memblock: updates for 6.12-rc1 2024-09-25 11:35:19 -07:00
memcontrol-v1.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
memcontrol-v1.h memcg: cleanup with !CONFIG_MEMCG_V1 2024-09-17 01:07:00 -07:00
memcontrol.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
memfd.c mm/hugetlb: simplify refs in memfd_alloc_folio 2024-09-26 14:01:44 -07:00
memory_hotplug.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
memory-failure.c mm: migrate: add isolate_folio_to_list() 2024-09-03 21:15:59 -07:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: avoid unconditional one-tick sleep when swapcache_prepare fails 2024-10-28 21:40:41 -07:00
mempolicy.c mm,memcg: provide per-cgroup counters for NUMA balancing operations 2024-09-03 21:15:36 -07:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
migrate.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmap.c mm: resolve faulty mmap_region() error path behaviour 2024-11-05 16:49:55 -08:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: refactor map_deny_write_exec() 2024-11-05 16:49:55 -08:00
mremap.c mm/mremap: fix move_normal_pmd/retract_page_tables race 2024-10-17 00:28:07 -07:00
mseal.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
msync.c
nommu.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-05 16:49:55 -08:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id 2024-10-28 21:40:40 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page_alloc.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-11-11 17:20:23 -08:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
page_isolation.c mm: remove migration for HugePage in isolate_single_pageblock() 2024-09-03 21:15:40 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
page-writeback.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
pagewalk.c mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check 2024-10-28 21:40:38 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c percpu: remove pcpu_alloc_size() 2024-09-01 20:26:04 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
rmap.c mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() 2024-11-03 10:47:03 -08:00
rodata_test.c
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-09 12:47:19 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
shmem.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-05 16:49:55 -08:00
show_mem.c mm/show_mem.c: report alloc tags in human readable units 2024-09-17 01:07:00 -07:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c mm: krealloc: Fix MTE false alarm in __do_krealloc 2024-10-29 10:40:53 +01:00
slab.h mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
slub.c mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
sparse-vmemmap.c LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space 2024-10-21 22:11:19 +08:00
sparse.c A set of X86 fixes: 2024-09-01 14:43:08 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios 2024-09-17 01:07:01 -07:00
swap.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-11-11 17:20:23 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swapfile.c mm, swap: avoid over reclaim of full clusters 2024-10-30 20:14:11 -07:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma_internal.h mm: remove duplicated include in vma_internal.h 2024-09-01 20:26:02 -07:00
vma.c mm: unconditionally close VMAs on error 2024-11-05 16:49:55 -08:00
vma.h mm: refactor map_deny_write_exec() 2024-11-05 16:49:55 -08:00
vmalloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
vmpressure.c
vmscan.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
vmstat.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
zswap.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00