Commit Graph

6 Commits

Author SHA1 Message Date
David Carlier
640c9dc72f tools/sched_ext: fix getopt not re-parsed on restart
After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.

Reset optind to 1 at the restart label so options are re-parsed.

Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-20 17:17:38 -10:00
David Carlier
cabd76bbc0 tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
fcg_read_stats() had a VLA allocating 21 * nr_cpus * 8 bytes on the
stack, risking stack overflow on large CPU counts (nr_cpus can be up
to 512).

Fix by using a single heap allocation with the correct size, reusing
it across all stat indices, and freeing it at the end.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-16 21:01:18 -10:00
David Carlier
11fece49e9 tools/sched_ext: scx_flatcg: zero-initialize stats counter array
The local cnts array in read_stats() is not initialized before being
accumulated into per-CPU stats, which may lead to reading garbage
values. Zero it out with memset alongside the existing stats array
initialization.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-12 07:28:02 -10:00
Andrea Righi
de68c05189 tools/sched_ext: Receive updates from SCX repo
Receive tools/sched_ext updates form https://github.com/sched-ext/scx to
sync userspace bits:

 - basic BPF arena allocator abstractions,

 - additional process flags definitions,

 - fixed is_migration_disabled() helper,

 - separate out user_exit_info BPF and user space code.

This also fixes the following warning when building the selftests:

 tools/sched_ext/include/scx/common.bpf.h:550:9: warning: 'likely' macro redefined [-Wmacro-redefined]
  550 | #define likely(x) __builtin_expect(!!(x), 1)
      |         ^

Co-developed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-11 08:21:57 -10:00
Tejun Heo
8da7bf2cee tools/sched_ext: Receive updates from SCX repo
Receive tools/sched_ext updates form https://github.com/sched-ext/scx to
sync userspace bits:

- scx_bpf_dump_header() added which can be used to print out basic scheduler
  info on dump.

- BPF possible/online CPU iterators added.

- CO-RE enums added. The enums are autogenerated from vmlinux.h. Include the
  generated artifacts in tools/sched_ext to keep the Makefile simpler.

- Other misc changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-12-12 16:16:57 -10:00
Tejun Heo
a4103eacc2 sched_ext: Add a cgroup scheduler which uses flattened hierarchy
This patch adds scx_flatcg example scheduler which implements hierarchical
weight-based cgroup CPU control by flattening the cgroup hierarchy into a
single layer by compounding the active weight share at each level.

This flattening of hierarchy can bring a substantial performance gain when
the cgroup hierarchy is nested multiple levels. in a simple benchmark using
wrk[8] on apache serving a CGI script calculating sha1sum of a small file,
it outperforms CFS by ~3% with CPU controller disabled and by ~10% with two
apache instances competing with 2:1 weight ratio nested four level deep.

However, the gain comes at the cost of not being able to properly handle
thundering herd of cgroups. For example, if many cgroups which are nested
behind a low priority parent cgroup wake up around the same time, they may
be able to consume more CPU cycles than they are entitled to. In many use
cases, this isn't a real concern especially given the performance gain.
Also, there are ways to mitigate the problem further by e.g. introducing an
extra scheduling layer on cgroup delegation boundaries.

v5: - Updated to specify SCX_OPS_HAS_CGROUP_WEIGHT instead of
      SCX_OPS_KNOB_CGROUP_WEIGHT.

v4: - Revert reference counted kptr for cgv_node as the change caused easily
      reproducible stalls.

v3: - Updated to reflect the core API changes including ops.init/exit_task()
      and direct dispatch from ops.select_cpu(). Fixes and improvements
      including additional statistics.

    - Use reference counted kptr for cgv_node instead of xchg'ing against
      stash location.

    - Dropped '-p' option.

v2: - Use SCX_BUG[_ON]() to simplify error handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
2024-09-04 10:24:59 -10:00