linux/mm
Mel Gorman beb9576530 mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages
commit 05f144a0d5 upstream.

Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled

    =============================================================================
    BUG numa_policy (Not tainted): Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
    INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
     __slab_alloc+0x3d3/0x445
     kmem_cache_alloc+0x29d/0x2b0
     mpol_new+0xa3/0x140
     sys_mbind+0x142/0x620
     system_call_fastpath+0x16/0x1b
    INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
     __slab_free+0x2e/0x1de
     kmem_cache_free+0x25a/0x260
     __mpol_put+0x27/0x30
     remove_vma+0x68/0x90
     exit_mmap+0x118/0x140
     mmput+0x73/0x110
     exit_mm+0x108/0x130
     do_exit+0x162/0xb90
     do_group_exit+0x4f/0xc0
     sys_exit_group+0x17/0x20
     system_call_fastpath+0x16/0x1b
    INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
    INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

This implied a reference counting bug and the problem happened during
mbind().

mbind() applies a new memory policy to a range and uses mbind_range() to
merge existing VMAs or split them as necessary.  In the event of splits,
mpol_dup() will allocate a new struct mempolicy and maintain existing
reference counts whose rules are documented in
Documentation/vm/numa_memory_policy.txt .

The problem occurs with shared memory policies.  The vm_op->set_policy
increments the reference count if necessary and split_vma() and
vma_merge() have already handled the existing reference counts.
However, policy_vma() screws it up by replacing an existing
vma->vm_policy with one that potentially has the wrong reference count
leading to a premature free.  This patch removes the damage caused by
policy_vma().

With this patch applied Dave's trinity tool runs an mbind test for 5
minutes without error.  /proc/slabinfo reported that there are no
numa_policy or shared_policy_node objects allocated after the test
completed and the shared memory region was deleted.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01 15:12:56 +08:00
..
backing-dev.c backing-dev: ensure wakeup_timer is deleted 2011-11-21 14:31:25 -08:00
bootmem.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-04-02 09:27:11 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
cleancache.c mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
compaction.c mm: compaction: check for overlapping nodes during isolation for migration 2012-02-13 11:06:11 -08:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc() 2011-01-13 17:32:48 -08:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-13 11:06:07 -08:00
filemap.c readahead: fix pipeline break caused by block plug 2012-02-13 11:06:04 -08:00
fremap.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c 2010-10-27 18:03:05 -07:00
huge_memory.c mm: thp: fix BUG on mm->nr_ptes 2012-03-12 10:32:56 -07:00
hugetlb.c hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() 2012-05-21 09:40:02 -07:00
hwpoison-inject.c Fix common misspellings 2011-03-31 11:26:23 -03:00
init-mm.c mm: convert mm->cpu_vm_cpumask into cpumask_var_t 2011-05-25 08:39:21 -07:00
internal.h mm: thp: tail page refcounting fix 2011-11-11 09:36:29 -08:00
Kconfig mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
Kconfig.debug mm: debug-pagealloc: fix kconfig dependency warning 2011-03-22 17:44:02 -07:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Do not return a pointer to an object that kmemleak did not get 2011-05-19 17:35:28 +01:00
ksm.c ksm: fix NULL pointer dereference in scan_get_next_rmap_item() 2011-06-15 20:04:02 -07:00
maccess.c maccess,probe_kernel: Make write/read src const void * 2011-05-25 19:56:23 -04:00
madvise.c thp: khugepaged: make khugepaged aware about madvise 2011-01-13 17:32:47 -08:00
Makefile mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
memblock.c mm/memblock: properly handle overlaps and fix error path 2011-03-22 17:44:09 -07:00
memcontrol.c memcg: free spare array to avoid memory leak 2012-05-21 09:40:04 -07:00
memory_hotplug.c mm, hotplug: protect zonelist building with zonelists_mutex 2011-06-22 21:06:48 -07:00
memory-failure.c mm/memory-failure.c: fix spinlock vs mutex order 2011-06-27 18:00:13 -07:00
memory.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-04-02 09:27:10 -07:00
mempolicy.c mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages 2012-06-01 15:12:56 +08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: fix race between mremap and removing migration entry 2011-10-25 07:10:17 +02:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-04-02 09:27:10 -07:00
mlock.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: get rid of the most spurious find_vma_prev() users 2011-06-16 00:35:09 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c mm: nobootmem: fix sign extend problem in __free_pages_memory() 2012-05-21 09:40:02 -07:00
nommu.c NOMMU: Don't need to clear vm_mm when deleting a VMA 2012-03-12 10:32:56 -07:00
oom_kill.c oom: fix integer overflow of points in oom_badness 2012-01-06 14:13:51 -08:00
page_alloc.c mm: fix NULL ptr dereference in __count_immobile_pages 2012-01-25 17:25:05 -08:00
page_cgroup.c memcg: fix init_page_cgroup nid with sparsemem 2011-06-15 20:04:01 -07:00
page_io.c block: kill off REQ_UNPLUG 2011-03-10 08:52:27 +01:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
page-writeback.c writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage 2011-10-03 11:40:43 -07:00
pagewalk.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-04-02 09:27:10 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c percpu: fix chunk range calculation 2011-12-21 12:57:37 -08:00
percpu.c percpu: pcpu_embed_first_chunk() should free unused parts after all allocs are complete 2012-05-21 09:40:02 -07:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead: readahead page allocations are OK to fail 2011-05-25 08:39:25 -07:00
rmap.c mm/memory-failure.c: fix spinlock vs mutex order 2011-06-27 18:00:13 -07:00
shmem.c tmpfs: add shmem_read_mapping_page_gfp 2011-06-27 18:00:12 -07:00
slab.c SLAB: Record actual last user of freed objects. 2011-06-03 19:33:50 +03:00
slob.c mm: Remove support for kmem_cache_name() 2011-01-23 21:00:05 +02:00
slub.c slub: Do not hold slub_lock when calling sysfs_slab_add() 2012-04-02 09:27:20 -07:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-04-02 09:27:11 -07:00
swap_state.c mm: fix s390 BUG by __set_page_dirty_no_writeback on swap 2012-04-27 09:51:07 -07:00
swap.c mm: fix UP THP spin_is_locked BUGs 2012-02-13 11:06:11 -08:00
swapfile.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-04-02 09:27:10 -07:00
thrash.c vmscan: implement swap token priority aging 2011-06-15 20:03:59 -07:00
truncate.c mm: fix assertion mapping->nrpages == 0 in end_writeback() 2011-06-27 18:00:13 -07:00
util.c mm: nommu: sort mm->mmap list properly 2011-05-25 08:39:05 -07:00
vmalloc.c mm: vmalloc: check for page allocation failure before vmlist insertion 2011-12-21 12:57:36 -08:00
vmscan.c memcg: fix vmscan count in small memcgs 2011-10-03 11:41:08 -07:00
vmstat.c mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur 2011-05-25 08:39:09 -07:00