mirror of
https://github.com/torvalds/linux.git
synced 2026-06-20 21:23:06 +02:00
Down, down in the deepest depths of GFP_NOIO page reclaim, we have shrink_page_list() calling __remove_mapping() calling __delete_from_ swap_cache() or __delete_from_page_cache(). You would not expect those to need much stack, but in fact they call radix_tree_delete(): which declares a 192-byte radix_tree_path array on its stack (to record the node,offsets it visits when descending, in case it needs to ascend to update them). And if any tag is still set [1], that calls radix_tree_tag_clear(), which declares a further such 192-byte radix_tree_path array on the stack. (At least we have interrupts disabled here, so won't then be pushing registers too.) That was probably a good choice when most users were 32-bit (array of half the size), and adding fields to radix_tree_node would have bloated it unnecessarily. But nowadays many are 64-bit, and each radix_tree_node contains a struct rcu_head, which is only used when freeing; whereas the radix_tree_path info is only used for updating the tree (deleting, clearing tags or setting tags if tagged) when a lock must be held, of no interest when accessing the tree locklessly. So add a parent pointer to the radix_tree_node, in union with the rcu_head, and remove all uses of the radix_tree_path. There would be space in that union to save the offset when descending as before (we can argue that a lock must already be held to exclude other users), but recalculating it when ascending is both easy (a constant shift and a constant mask) and uncommon, so it seems better just to do that. Two little optimizations: no need to decrement height when descending, adjusting shift is enough; and once radix_tree_tag_if_tagged() has set tag on a node and its ancestors, it need not ascend from that node again. perf on the radix tree test harness reports radix_tree_insert() as 2% slower (now having to set parent), but radix_tree_delete() 24% faster. Surely that's an exaggeration from rtth's artificially low map shift 3, but forcing it back to 6 still rates radix_tree_delete() 8% faster. [1] Can a pagecache tag (dirty, writeback or towrite) actually still be set at the time of radix_tree_delete()? Perhaps not if the filesystem is well-behaved. But although I've not tracked any stack overflow down to this cause, I have observed a curious case in which a dirty tag is set and left set on tmpfs: page migration's migrate_page_copy() happens to use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a filesystem which doesn't use tags, except for this stack depth issue. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nai Xia <nai.xia@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|---|---|---|
| .. | ||
| lzo | ||
| mpi | ||
| raid6 | ||
| reed_solomon | ||
| xz | ||
| zlib_deflate | ||
| zlib_inflate | ||
| .gitignore | ||
| argv_split.c | ||
| atomic64_test.c | ||
| atomic64.c | ||
| audit.c | ||
| average.c | ||
| bcd.c | ||
| bch.c | ||
| bitmap.c | ||
| bitrev.c | ||
| bsearch.c | ||
| btree.c | ||
| bug.c | ||
| bust_spinlocks.c | ||
| check_signature.c | ||
| checksum.c | ||
| cmdline.c | ||
| cordic.c | ||
| cpu_rmap.c | ||
| cpu-notifier-error-inject.c | ||
| cpumask.c | ||
| crc-ccitt.c | ||
| crc-itu-t.c | ||
| crc-t10dif.c | ||
| crc7.c | ||
| crc8.c | ||
| crc16.c | ||
| crc32.c | ||
| crc32defs.h | ||
| ctype.c | ||
| debug_locks.c | ||
| debugobjects.c | ||
| dec_and_lock.c | ||
| decompress_bunzip2.c | ||
| decompress_inflate.c | ||
| decompress_unlzma.c | ||
| decompress_unlzo.c | ||
| decompress_unxz.c | ||
| decompress.c | ||
| devres.c | ||
| digsig.c | ||
| div64.c | ||
| dma-debug.c | ||
| dump_stack.c | ||
| dynamic_debug.c | ||
| dynamic_queue_limits.c | ||
| extable.c | ||
| fault-inject.c | ||
| find_last_bit.c | ||
| find_next_bit.c | ||
| flex_array.c | ||
| gcd.c | ||
| gen_crc32table.c | ||
| genalloc.c | ||
| halfmd4.c | ||
| hexdump.c | ||
| hweight.c | ||
| idr.c | ||
| inflate.c | ||
| int_sqrt.c | ||
| iomap_copy.c | ||
| iomap.c | ||
| iommu-helper.c | ||
| ioremap.c | ||
| irq_regs.c | ||
| is_single_threaded.c | ||
| kasprintf.c | ||
| Kconfig | ||
| Kconfig.debug | ||
| Kconfig.kgdb | ||
| Kconfig.kmemcheck | ||
| klist.c | ||
| kobject_uevent.c | ||
| kobject.c | ||
| kstrtox.c | ||
| kstrtox.h | ||
| lcm.c | ||
| libcrc32c.c | ||
| list_debug.c | ||
| list_sort.c | ||
| llist.c | ||
| locking-selftest-hardirq.h | ||
| locking-selftest-mutex.h | ||
| locking-selftest-rlock-hardirq.h | ||
| locking-selftest-rlock-softirq.h | ||
| locking-selftest-rlock.h | ||
| locking-selftest-rsem.h | ||
| locking-selftest-softirq.h | ||
| locking-selftest-spin-hardirq.h | ||
| locking-selftest-spin-softirq.h | ||
| locking-selftest-spin.h | ||
| locking-selftest-wlock-hardirq.h | ||
| locking-selftest-wlock-softirq.h | ||
| locking-selftest-wlock.h | ||
| locking-selftest-wsem.h | ||
| locking-selftest.c | ||
| lru_cache.c | ||
| Makefile | ||
| md5.c | ||
| nlattr.c | ||
| parser.c | ||
| pci_iomap.c | ||
| percpu_counter.c | ||
| plist.c | ||
| prio_heap.c | ||
| prio_tree.c | ||
| proportions.c | ||
| radix-tree.c | ||
| random32.c | ||
| ratelimit.c | ||
| rational.c | ||
| rbtree.c | ||
| reciprocal_div.c | ||
| rwsem-spinlock.c | ||
| rwsem.c | ||
| scatterlist.c | ||
| sha1.c | ||
| show_mem.c | ||
| smp_processor_id.c | ||
| sort.c | ||
| spinlock_debug.c | ||
| string_helpers.c | ||
| string.c | ||
| swiotlb.c | ||
| syscall.c | ||
| test-kstrtox.c | ||
| textsearch.c | ||
| timerqueue.c | ||
| ts_bm.c | ||
| ts_fsm.c | ||
| ts_kmp.c | ||
| uuid.c | ||
| vsprintf.c | ||