linux/mm
Yongseok Koh f2fa92b29d vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE
commit 88f5004430 upstream.

In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.

But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags.  After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.

The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed.  Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.

It is highly probable when vmalloc/vfree are called frequently.  This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().

Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark.  Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable.  So,
it could go down to negative value temporarily.

Consequently, removing the BUG_ON condition is proper.

A possible BUG_ON message is like the below.

   kernel BUG at mm/vmalloc.c:517!
   invalid opcode: 0000 [#1] SMP
   EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3
   EIP is at __purge_vmap_area_lazy+0x144/0x150
   EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
   ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
   Call Trace:
   [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70
   [<c0482b02>] remove_vm_area+0x22/0x70
   [<c0482c15>] __vunmap+0x45/0xe0
   [<c04831ec>] vmalloc+0x2c/0x30
   Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
   EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

[ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

Signed-off-by: Yongseok Koh <yongseok.koh@samsung.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25 10:49:43 -08:00
..
allocpercpu.c percpu: use dynamic percpu allocator as the default percpu allocator 2009-06-24 15:13:35 +09:00
backing-dev.c Thaw refrigerated bdi flusher threads before invoking kthread_stop on them 2009-11-12 13:08:11 +01:00
bootmem.c kmemleak: Do not report alloc_bootmem blocks as leaks 2009-08-27 14:29:17 +01:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
filemap.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE 2009-11-10 04:15:47 +01:00
hugetlb.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
hwpoison-inject.c HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs 2009-09-16 11:50:17 +02:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h ksm: fix mlockfreed to munlocked 2010-01-06 15:05:22 -08:00
Kconfig NOMMU: Optimise away the {dac_,}mmap_min_addr tests 2010-01-06 15:04:30 -08:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c kmemleak: Check for NULL pointer returned by create_object() 2009-10-09 13:28:47 -07:00
ksm.c ksm: fix mlockfreed to munlocked 2010-01-06 15:05:22 -08:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
Makefile procfs: disable per-task stack usage on NOMMU 2009-09-24 17:11:24 -07:00
memcontrol.c memcg: ensure list is empty at rmdir 2010-01-22 15:18:01 -08:00
memory_hotplug.c mm: allow memory hotplug and hibernation in the same kernel 2009-11-17 17:40:33 -08:00
memory-failure.c hwpoison: fix oops on ksm pages 2009-10-29 07:39:24 -07:00
memory.c mm: sigbus instead of abusing oom 2009-12-18 14:05:51 -08:00
mempolicy.c do_mbind(): fix memory leak 2009-10-29 07:39:29 -07:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c memcg: fix wrong pointer initialization at page migration when memcg is disabled. 2009-11-12 07:25:56 -08:00
mincore.c mm: hugetlb: fix hugepage memory leak in mincore() 2009-12-18 14:04:29 -08:00
mlock.c ksm: fix mlockfreed to munlocked 2010-01-06 15:05:22 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c untangle the do_mremap() mess 2010-01-18 10:19:11 -08:00
mmu_context.c mm: reduce atomic use on use_mm fast path 2009-09-22 07:17:42 -07:00
mmu_notifier.c ksm: add mmu_notifier set_pte_at_notify() 2009-09-22 07:17:31 -07:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf: Do the big rename: Performance Counters -> Performance Events 2009-09-21 14:28:04 +02:00
mremap.c untangle the do_mremap() mess 2010-01-18 10:19:11 -08:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c NOMMU: Don't pass NULL pointers to fput() in do_mmap_pgoff() 2009-10-31 12:11:37 -07:00
oom_kill.c memcg: avoid oom-killing innocent task in case of use_hierarchy 2010-01-06 15:04:37 -08:00
page_alloc.c page allocator: update NR_FREE_PAGES only when necessary 2010-01-22 15:18:12 -08:00
page_cgroup.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
page_io.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c writeback: account IO throttling wait as iowait 2009-10-09 12:40:42 +02:00
pagewalk.c mm: hugetlb: fix hugepage memory leak in walk_page_range() 2009-12-18 14:04:30 -08:00
percpu.c percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability 2009-11-13 00:55:35 +09:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c cpumask: use new-style cpumask ops in mm/quicklist. 2009-09-24 09:34:52 +09:30
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c mm/rmap.c: fix comment 2009-10-01 16:11:12 -07:00
shmem_acl.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
shmem.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
slab.c mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
slob.c slab: remove duplicate kmem_cache_init_late() declarations 2009-08-06 11:36:25 +03:00
slub.c mm: kmem_cache_create(): make it easier to catch NULL cache names 2009-09-22 07:17:33 -07:00
sparse-vmemmap.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
sparse.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
swap_state.c mm: add_to_swap_cache() does not return -EEXIST 2009-09-22 07:17:35 -07:00
swap.c mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
swapfile.c mm: remove incorrect swap_count() from try_to_unuse() 2009-11-02 09:44:41 -08:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c vfs: Fix vmtruncate() regression 2010-01-22 15:18:41 -08:00
util.c untangle the do_mremap() mess 2010-01-18 10:19:11 -08:00
vmalloc.c vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE 2010-01-25 10:49:43 -08:00
vmscan.c vmscan: do not evict inactive pages when skipping an active list scan 2010-01-06 15:05:21 -08:00
vmstat.c mm: vmstat: add isolate pages 2009-09-22 07:17:29 -07:00