linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-15 18:12:22 +02:00

History

Arunpravin Paneer Selvam 493740d790 drm/buddy: Improve offset-aligned allocation handling Large alignment requests previously forced the buddy allocator to search by alignment order, which often caused higher-order free blocks to be split even when a suitably aligned smaller region already existed within them. This led to excessive fragmentation, especially for workloads requesting small sizes with large alignment constraints. This change prioritizes the requested allocation size during the search and uses an augmented RB-tree field (subtree_max_alignment) to efficiently locate free blocks that satisfy both size and offset-alignment requirements. As a result, the allocator can directly select an aligned sub-region without splitting larger blocks unnecessarily. A practical example is the VKCTS test dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which repeatedly allocates 8 KiB buffers with a 256 KiB alignment. Previously, such allocations caused large blocks to be split aggressively, despite smaller aligned regions being sufficient. With this change, those aligned regions are reused directly, significantly reducing fragmentation. This improvement is visible in the amdgpu VRAM buddy allocator state (/sys/kernel/debug/dri/1/amdgpu_vram_mm). After the change, higher-order blocks are preserved and the number of low-order fragments is substantially reduced. Before: order- 5 free: 1936 MiB, blocks: 15490 order- 4 free: 967 MiB, blocks: 15486 order- 3 free: 483 MiB, blocks: 15485 order- 2 free: 241 MiB, blocks: 15486 order- 1 free: 241 MiB, blocks: 30948 After: order- 5 free: 493 MiB, blocks: 3941 order- 4 free: 246 MiB, blocks: 3943 order- 3 free: 123 MiB, blocks: 4101 order- 2 free: 61 MiB, blocks: 4101 order- 1 free: 61 MiB, blocks: 8018 By avoiding unnecessary splits, this change improves allocator efficiency and helps maintain larger contiguous free regions under heavy offset-aligned allocation workloads. v2:(Matthew) - Update augmented information along the path to the inserted node. v3: - Move the patch to gpu/buddy.c file. v4:(Matthew) - Use the helper instead of calling _ffs directly - Remove gpu_buddy_block_order(block) >= order check and drop order - Drop !node check as all callers handle this already - Return larger than any other possible alignment for __ffs64(0) - Replace __ffs with __ffs64 v5:(Matthew) - Drop subtree_max_alignment initialization at gpu_block_alloc() Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260306060155.2114-1-Arunpravin.PaneerSelvam@amd.com		2026-03-09 12:36:10 +05:30
..
acpi	mailbox: platform and core updates	2026-02-14 11:13:32 -08:00
asm-generic	hyperv-next for v7.0	2026-02-20 08:48:31 -08:00
clocksource
crypto	Networking changes for 7.0	2026-02-11 19:31:52 -08:00
cxl
drm	Merge drm/drm-next into drm-misc-next	2026-03-03 10:40:37 +01:00
dt-bindings	phy-for-7.0	2026-02-17 11:40:04 -08:00
hyperv	hyperv-next for v7.0	2026-02-20 08:48:31 -08:00
keys
kunit	treewide: Replace kmalloc with kmalloc_obj for non-scalar types	2026-02-21 01:02:28 -08:00
kvm
linux	drm/buddy: Improve offset-aligned allocation handling	2026-03-09 12:36:10 +05:30
math-emu
media	[GIT PULL for v7.0] media updates	2026-02-11 12:20:25 -08:00
memory
misc
net	Including fixes from IPsec, Bluetooth and netfilter	2026-02-26 08:00:13 -08:00
pcmcia
ras
rdma	RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev()	2026-02-25 07:50:10 -05:00
rv	rv: Fix multiple definition of __pcpu_unique_da_mon_this	2026-02-20 13:12:00 +01:00
scsi	SCSI misc on 20260212	2026-02-12 15:43:02 -08:00
soc
sound
target
trace	drm-misc-next for v7.1:	2026-03-02 16:58:07 +10:00
uapi	Merge tag 'drm-xe-next-2026-03-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next	2026-03-03 10:37:29 +10:00
ufs
vdso
video
xen
Kbuild