linux/mm
Zhang Yi a42efb79d5 futex: Take hugepages into account when generating futex_key
commit 13d60f4b6a upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20 08:26:28 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2012-10-02 10:30:36 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c mm: cleancache: Use __read_mostly as appropiate. 2012-01-23 16:08:09 -05:00
compaction.c mm: compaction: fix echo 1 > compact_memory return error issue 2013-01-17 08:50:43 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c radix-tree: use iterators in find_get_pages* functions 2012-03-28 17:14:37 -07:00
fremap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer 2013-06-07 12:49:29 -07:00
hugetlb.c futex: Take hugepages into account when generating futex_key 2013-08-20 08:26:28 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: thp: tail page refcounting fix 2011-11-02 16:06:57 -07:00
Kconfig Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: Hold a file reference in madvise_remove 2012-07-16 09:04:43 -07:00
Makefile Cross Memory Attach 2011-10-31 17:30:44 -07:00
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-31 10:02:56 -07:00
memcontrol.c memcg: oom: fix totalpages calculation for memory.swappiness==0 2012-11-26 11:37:45 -08:00
memory_hotplug.c memory hotplug: fix section info double registration bug 2012-10-02 10:30:06 -07:00
memory-failure.c mm: soft offline: split thp at the beginning of soft_offline_page() 2012-12-10 10:59:39 -08:00
memory.c vm: add vm_iomap_memory() helper function 2013-04-25 21:19:56 -07:00
mempolicy.c tmpfs mempolicy: fix /proc/mounts corrupting memory 2013-01-11 09:06:49 -08:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: migration: add migrate_entry_wait_huge() 2013-06-20 11:58:46 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c hugetlbfs: fix mmap failure in unaligned size request 2013-05-19 10:54:48 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: collapse security_vm_enough_memory() variants into a single function 2012-02-14 10:45:39 +11:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-16 09:04:45 -07:00
nommu.c vm: add no-mmu vm_iomap_memory() stub 2013-08-20 08:26:27 -07:00
oom_kill.c signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig() 2012-03-23 16:58:41 -07:00
page_alloc.c mm/memory-hotplug: fix lowmem count overflow when offline pages 2013-08-04 16:26:07 +08:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c mm: fix calculation of dirtyable memory 2013-01-11 09:06:48 -08:00
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE 2012-03-21 17:55:02 -07:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rmap.c mm: fix XFS oops due to dirty pages without buffers on s390 2012-10-31 10:02:56 -07:00
shmem.c tmpfs: fix use-after-free of mempolicy object 2013-02-28 06:59:01 -08:00
slab.c slab: fix the DEADLOCK issue on l3 alien lock 2012-10-13 05:38:37 +09:00
slob.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
slub.c slub: fix a memory leak in get_partial_node() 2012-06-10 00:36:11 +09:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/vmemmap: fix wrong use of virt_to_page 2012-12-10 10:59:39 -08:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-20 11:58:45 -07:00
swap.c mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug 2012-03-21 17:54:58 -07:00
swapfile.c swap: fix shmem swapping when more than 8 areas 2012-06-22 11:36:55 -07:00
thrash.c mm/thrash.c: quiet sparse noise 2011-10-31 17:30:50 -07:00
truncate.c mm: fix invalidate_complete_page2() lock ordering 2012-10-13 05:38:51 +09:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: fix faulty initialization in vmalloc_init() 2012-06-10 00:36:06 +09:00
vmscan.c mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() 2012-11-26 11:37:19 -08:00
vmstat.c mm: fix up the vmscan stat in vmstat 2012-04-25 21:26:33 -07:00