linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-30 01:53:29 +02:00

Author	SHA1	Message	Date
Zhao Mengmeng	d6edb15ad9	scx_central: Defer timer start to central dispatch to fix init error scx_central currently assumes that ops.init() runs on the selected central CPU and aborts otherwise. This is no longer true, as ops.init() is invoked from the scx_enable_helper thread, which can run on any CPU. As a result, sched_setaffinity() from userspace doesn't work, causing scx_central to fail when loading with: [ 1985.319942] sched_ext: central: scx_central.bpf.c:314: init from non-central CPU [ 1985.320317] scx_exit+0xa3/0xd0 [ 1985.320535] scx_bpf_error_bstr+0xbd/0x220 [ 1985.320840] bpf_prog_3a445a8163fa8149_central_init+0x103/0x1ba [ 1985.321073] bpf__sched_ext_ops_init+0x40/0xa8 [ 1985.321286] scx_root_enable_workfn+0x507/0x1650 [ 1985.321461] kthread_worker_fn+0x260/0x940 [ 1985.321745] kthread+0x303/0x3e0 [ 1985.321901] ret_from_fork+0x589/0x7d0 [ 1985.322065] ret_from_fork_asm+0x1a/0x30 DEBUG DUMP =================================================================== central: root scx_enable_help[134] triggered exit kind 1025: scx_bpf_error (scx_central.bpf.c:314: init from non-central CPU) Fix this by: - Defer bpf_timer_start() to the first dispatch on the central CPU. - Initialize the BPF timer in central_init() and kick the central CPU to guarantee entering the dispatch path on the central CPU immediately. - Remove the unnecessary sched_setaffinity() call in userspace. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-27 07:33:00 -10:00
Cheng-Yang Chou	bd377af097	sched_ext: Fix incomplete help text usage strings Several demo schedulers and the selftest runner had usage strings that omitted options which are actually supported: - scx_central: add missing [-v] - scx_pair: add missing [-v] - scx_qmap: add missing [-S] and [-H] - scx_userland: add missing [-v] - scx_sdt: remove [-f] which no longer exists - runner.c: add missing [-s], [-l], [-q]; drop [-h] which none of the other sched_ext tools list in their usage lines Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-03-11 11:02:57 -10:00
Cheng-Yang Chou	0c36a6f6f0	tools/sched_ext: scx_central: Remove unused '-p' option The '-p' option is defined in getopt() but not handled in the switch statement or documented in the help text. Providing '-p' currently triggers the default error path. Remove it to sync the optstring with the actual implementation. Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-23 07:45:30 -10:00
David Carlier	640c9dc72f	tools/sched_ext: fix getopt not re-parsed on restart After goto restart, optind retains its advanced position from the previous getopt loop, causing getopt() to immediately return -1. This silently drops all command-line options on the restarted skeleton. Reset optind to 1 at the restart label so options are re-parsed. Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair, scx_sdt, scx_cpu0. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-20 17:17:38 -10:00
David Carlier	55a24d9203	tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit Use CPU_SET_S() instead of CPU_SET() on the dynamically allocated cpuset to avoid a potential out-of-bounds write when nr_cpu_ids exceeds CPU_SETSIZE. Also destroy the skeleton before returning on invalid central CPU ID to prevent a resource leak. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-18 07:03:50 -10:00
David Carlier	988369d236	tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size The cpu set is dynamically allocated for nr_cpu_ids using CPU_ALLOC(), so the size passed to sched_setaffinity() should be CPU_ALLOC_SIZE() rather than sizeof(cpu_set_t). Valgrind flagged this as accessing unaddressable bytes past the allocation. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-12 07:30:17 -10:00
Andrea Righi	de68c05189	tools/sched_ext: Receive updates from SCX repo Receive tools/sched_ext updates form https://github.com/sched-ext/scx to sync userspace bits: - basic BPF arena allocator abstractions, - additional process flags definitions, - fixed is_migration_disabled() helper, - separate out user_exit_info BPF and user space code. This also fixes the following warning when building the selftests: tools/sched_ext/include/scx/common.bpf.h:550:9: warning: 'likely' macro redefined [-Wmacro-redefined] 550 \| #define likely(x) __builtin_expect(!!(x), 1) \| ^ Co-developed-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-11 08:21:57 -10:00
Tejun Heo	f2c880fc81	tools/sched_ext: Sync with scx repo Synchronize with https://github.com/sched-ext/scx at d384453984a0 ("kernel: Sync at `ad3b301aa0` ("sched_ext: Provides a sysfs 'events' to expose core event counters")"). Signed-off-by: Tejun Heo <tj@kernel.org>	2025-02-14 08:46:20 -10:00
Tejun Heo	8da7bf2cee	tools/sched_ext: Receive updates from SCX repo Receive tools/sched_ext updates form https://github.com/sched-ext/scx to sync userspace bits: - scx_bpf_dump_header() added which can be used to print out basic scheduler info on dump. - BPF possible/online CPU iterators added. - CO-RE enums added. The enums are autogenerated from vmlinux.h. Include the generated artifacts in tools/sched_ext to keep the Makefile simpler. - Other misc changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-12-12 16:16:57 -10:00
guanjing	f24d192985	sched_ext: fix application of sizeof to pointer sizeof when applied to a pointer typed expression gives the size of the pointer. The proper fix in this particular case is to code sizeof(*cpuset) instead of sizeof(cpuset). This issue was detected with the help of Coccinelle. Fixes: `22a920209a` ("sched_ext: Implement tickless support") Signed-off-by: guanjing <guanjing@cmss.chinamobile.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2024-12-04 09:47:39 -10:00
Tejun Heo	60c27fb59f	sched_ext: Implement sched_ext_ops.cpu_online/offline() Add ops.cpu_online/offline() which are invoked when CPUs come online and offline respectively. As the enqueue path already automatically bypasses tasks to the local dsq on a deactivated CPU, BPF schedulers are guaranteed to see tasks only on CPUs which are between online() and offline(). If the BPF scheduler doesn't implement ops.cpu_online/offline(), the scheduler is automatically exited with SCX_ECODE_RESTART \| SCX_ECODE_RSN_HOTPLUG. Userspace can implement CPU hotpplug support trivially by simply reinitializing and reloading the scheduler. scx_qmap is updated to print out online CPUs on hotplug events. Other schedulers are updated to restart based on ecode. v3: - The previous implementation added @reason to sched_class.rq_on/offline() to distinguish between CPU hotplug events and topology updates. This was buggy and fragile as the methods are skipped if the current state equals the target state. Instead, add scx_rq_[de]activate() which are directly called from sched_cpu_de/activate(). This also allows ops.cpu_on/offline() to sleep which can be useful. - ops.dispatch() could be called on a CPU that the BPF scheduler was told to be offline. The dispatch patch is updated to bypass in such cases. v2: - To accommodate lock ordering change between scx_cgroup_rwsem and cpus_read_lock(), CPU hotplug operations are put into its own SCX_OPI block and enabled eariler during scx_ope_enable() so that cpus_read_lock() can be dropped before acquiring scx_cgroup_rwsem. - Auto exit with ECODE added. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>	2024-06-18 10:09:20 -10:00
Tejun Heo	22a920209a	sched_ext: Implement tickless support Allow BPF schedulers to indicate tickless operation by setting p->scx.slice to SCX_SLICE_INF. A CPU whose current task has infinte slice goes into tickless operation. scx_central is updated to use tickless operations for all tasks and instead use a BPF timer to expire slices. This also uses the SCX_ENQ_PREEMPT and task state tracking added by the previous patches. Currently, there is no way to pin the timer on the central CPU, so it may end up on one of the worker CPUs; however, outside of that, the worker CPUs can go tickless both while running sched_ext tasks and idling. With schbench running, scx_central shows: root@test ~# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts LOC: 142024 656 664 449 Local timer interrupts LOC: 161663 663 665 449 Local timer interrupts Without it: root@test ~ [SIGINT]# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts LOC: 188778 3142 3793 3993 Local timer interrupts LOC: 198993 5314 6323 6438 Local timer interrupts While scx_central itself is too barebone to be useful as a production scheduler, a more featureful central scheduler can be built using the same approach. Google's experience shows that such an approach can have significant benefits for certain applications such as VM hosting. v4: Allow operation even if BPF_F_TIMER_CPU_PIN is not available. v3: Pin the central scheduler's timer on the central_cpu using BPF_F_TIMER_CPU_PIN. v2: Convert to BPF inline iterators. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>	2024-06-18 10:09:19 -10:00
Tejun Heo	037df2a314	sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU This patch adds a new example scheduler, scx_central, which demonstrates central scheduling where one CPU is responsible for making all scheduling decisions in the system using scx_bpf_kick_cpu(). The central CPU makes scheduling decisions for all CPUs in the system, queues tasks on the appropriate local dsq's and preempts the worker CPUs. The worker CPUs in turn preempt the central CPU when it needs tasks to run. Currently, every CPU depends on its own tick to expire the current task. A follow-up patch implementing tickless support for sched_ext will allow the worker CPUs to go full tickless so that they can run completely undisturbed. v3: - Kumar fixed a bug where the dispatch path could overflow the dispatch buffer if too many are dispatched to the fallback DSQ. - Use the new SCX_KICK_IDLE to wake up non-central CPUs. - Dropped '-p' option. v2: - Use RESIZABLE_ARRAY() instead of fixed MAX_CPUS and use SCX_BUG[_ON]() to simplify error handling. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Julia Lawall <julia.lawall@inria.fr>	2024-06-18 10:09:19 -10:00

13 Commits