selftests/damon/wss_estimation: test for up to 160 MiB working set size

DAMON reads and writes Accessed bits of page tables without manual TLB flush for two reasons. First, it minimizes the overhead. Second, real systems that need DAMON are expected to be memory intensive enough to cause periodic TLB flushes. For test setups that use small test workloads, however, the system's TLB could be big enough to cover whole or most accesses of the test workload. In this case, no page table walk happens and DAMON cannot show any access from the test workload. The test workload for DAMON's working set size estimation selftest is such a case. It accesses only 10 MiB working set, and it turned out there are test setups that have TLBs large enough to cover the 10 MiB data accesses. As a result, the test fails depending on the test machine. Make it more reliable by trying larger working sets up to 160 MiB when it fails. Link: https://lkml.kernel.org/r/20260117020731.226785-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-01 19:13:47 +02:00 · 2026-01-16 18:07:25 -08:00 · 2026-01-16 18:07:25 -08:00 · 891d206e27
commit 891d206e27
parent 94a62284ed
1 changed files with 23 additions and 6 deletions
--- a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
+++ b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
@ -6,9 +6,8 @@ import time

 import _damon_sysfs

-def main():
-    # access two 10 MiB memory regions, 2 second per each
-    sz_region = 10 * 1024 * 1024
+def pass_wss_estimation(sz_region):
+    # access two regions of given size, 2 seocnds per each region
    proc = subprocess.Popen(['./access_memory', '2', '%d' % sz_region, '2000'])
    kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
            contexts=[_damon_sysfs.DamonCtx(
@ -36,20 +35,38 @@ def main():

        wss_collected.append(
                kdamonds.kdamonds[0].contexts[0].schemes[0].tried_bytes)
+    err = kdamonds.stop()
+    if err is not None:
+        print('kdamond stop failed: %s' % err)
+        exit(1)

    wss_collected.sort()
    acceptable_error_rate = 0.2
    for percentile in [50, 75]:
        sample = wss_collected[int(len(wss_collected) * percentile / 100)]
        error_rate = abs(sample - sz_region) / sz_region
-        print('%d-th percentile (%d) error %f' %
-                (percentile, sample, error_rate))
+        print('%d-th percentile error %f (expect %d, result %d)' %
+                (percentile, error_rate, sz_region, sample))
        if error_rate > acceptable_error_rate:
            print('the error rate is not acceptable (> %f)' %
                    acceptable_error_rate)
            print('samples are as below')
            print('\n'.join(['%d' % wss for wss in wss_collected]))
-            exit(1)
+            return False
+    return True
+
+def main():
+    # DAMON doesn't flush TLB.  If the system has large TLB that can cover
+    # whole test working set, DAMON cannot see the access.  Test up to 160 MiB
+    # test working set.
+    sz_region_mb = 10
+    max_sz_region_mb = 160
+    while sz_region_mb <= max_sz_region_mb:
+        test_pass = pass_wss_estimation(sz_region_mb * 1024 * 1024)
+        if test_pass is True:
+            exit(0)
+        sz_region_mb *= 2
+    exit(1)

 if __name__ == '__main__':
    main()