mirror of
https://github.com/torvalds/linux.git
synced 2026-06-08 06:25:52 +02:00
Kmem accounting of memcg is unusable now, because it lacks slab shrinker
support. That means when we hit the limit we will get ENOMEM w/o any
chance to recover. What we should do then is to call shrink_slab, which
would reclaim old inode/dentry caches from this cgroup. This is what
this patch set is intended to do.
Basically, it does two things. First, it introduces the notion of
per-memcg slab shrinker. A shrinker that wants to reclaim objects per
cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be
passed the memory cgroup to scan from in shrink_control->memcg. For
such shrinkers shrink_slab iterates over the whole cgroup subtree under
the target cgroup and calls the shrinker for each kmem-active memory
cgroup.
Secondly, this patch set makes the list_lru structure per-memcg. It's
done transparently to list_lru users - everything they have to do is to
tell list_lru_init that they want memcg-aware list_lru. Then the
list_lru will automatically distribute objects among per-memcg lists
basing on which cgroup the object is accounted to. This way to make FS
shrinkers (icache, dcache) memcg-aware we only need to make them use
memcg-aware list_lru, and this is what this patch set does.
As before, this patch set only enables per-memcg kmem reclaim when the
pressure goes from memory.limit, not from memory.kmem.limit. Handling
memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
it is still unclear whether we will have this knob in the unified
hierarchy.
This patch (of 9):
NUMA aware slab shrinkers use the list_lru structure to distribute
objects coming from different NUMA nodes to different lists. Whenever
such a shrinker needs to count or scan objects from a particular node,
it issues commands like this:
count = list_lru_count_node(lru, sc->nid);
freed = list_lru_walk_node(lru, sc->nid, isolate_func,
isolate_arg, &sc->nr_to_scan);
where sc is an instance of the shrink_control structure passed to it
from vmscan.
To simplify this, let's add special list_lru functions to be used by
shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
consolidate the nid and nr_to_scan arguments in the shrink_control
structure.
This will also allow us to avoid patching shrinkers that use list_lru
when we make shrink_slab() per-memcg - all we will have to do is extend
the shrink_control structure to include the target memcg and make
list_lru_shrink_{count,walk} handle this appropriately.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|---|---|---|
| .. | ||
| backing-dev.c | ||
| balloon_compaction.c | ||
| bootmem.c | ||
| cleancache.c | ||
| cma.c | ||
| compaction.c | ||
| debug-pagealloc.c | ||
| debug.c | ||
| dmapool.c | ||
| early_ioremap.c | ||
| fadvise.c | ||
| failslab.c | ||
| filemap_xip.c | ||
| filemap.c | ||
| frontswap.c | ||
| gup.c | ||
| highmem.c | ||
| huge_memory.c | ||
| hugetlb_cgroup.c | ||
| hugetlb.c | ||
| hwpoison-inject.c | ||
| init-mm.c | ||
| internal.h | ||
| interval_tree.c | ||
| iov_iter.c | ||
| Kconfig | ||
| Kconfig.debug | ||
| kmemcheck.c | ||
| kmemleak-test.c | ||
| kmemleak.c | ||
| ksm.c | ||
| list_lru.c | ||
| maccess.c | ||
| madvise.c | ||
| Makefile | ||
| memblock.c | ||
| memcontrol.c | ||
| memory_hotplug.c | ||
| memory-failure.c | ||
| memory.c | ||
| mempolicy.c | ||
| mempool.c | ||
| migrate.c | ||
| mincore.c | ||
| mlock.c | ||
| mm_init.c | ||
| mmap.c | ||
| mmu_context.c | ||
| mmu_notifier.c | ||
| mmzone.c | ||
| mprotect.c | ||
| mremap.c | ||
| msync.c | ||
| nobootmem.c | ||
| nommu.c | ||
| oom_kill.c | ||
| page_alloc.c | ||
| page_counter.c | ||
| page_ext.c | ||
| page_io.c | ||
| page_isolation.c | ||
| page_owner.c | ||
| page-writeback.c | ||
| pagewalk.c | ||
| percpu-km.c | ||
| percpu-vm.c | ||
| percpu.c | ||
| pgtable-generic.c | ||
| process_vm_access.c | ||
| quicklist.c | ||
| readahead.c | ||
| rmap.c | ||
| shmem.c | ||
| slab_common.c | ||
| slab.c | ||
| slab.h | ||
| slob.c | ||
| slub.c | ||
| sparse-vmemmap.c | ||
| sparse.c | ||
| swap_cgroup.c | ||
| swap_state.c | ||
| swap.c | ||
| swapfile.c | ||
| truncate.c | ||
| util.c | ||
| vmacache.c | ||
| vmalloc.c | ||
| vmpressure.c | ||
| vmscan.c | ||
| vmstat.c | ||
| workingset.c | ||
| zbud.c | ||
| zpool.c | ||
| zsmalloc.c | ||
| zswap.c | ||