linux/kernel
Linus Torvalds 664f0f6be3 sched_ext: Fixes for v7.1-rc1
The merge window pulled in the cgroup sub-scheduler infrastructure, and
 new AI reviews are accelerating bug reporting and fixing - hence the
 larger than usual fixes batch.
 
 - Use-after-frees during scheduler load/unload. The disable path
   could free the BPF scheduler while deferred irq_work / kthread work
   was still in flight; cgroup setter callbacks read the active
   scheduler outside the rwsem that synchronizes against teardown.
   Fixed both, and reused the disable drain in the enable error paths
   so the BPF JIT page can't be freed under live callbacks.
 
 - Several BPF op invocations didn't tell the framework which runqueue
   was already locked, so helper kfuncs that re-acquire the runqueue
   by CPU could deadlock on the held lock. Fixed at the affected
   callsites, including recursive parent-into-child dispatch.
 
 - The hardlockup notifier ran from NMI but eventually took a
   non-NMI-safe lock. Bounced through irq_work.
 
 - A handful of bugs in the new sub-scheduler hierarchy: helper
   kfuncs hard-coded the root instead of resolving the caller's
   scheduler; the enable error path tried to disable per-task state
   that had never been initialized, and leaked cpus_read_lock on the
   way out; a sysfs object was leaked on every load/unload; the
   dispatch fast-path used the root scheduler instead of the task's;
   and a couple of CONFIG #ifdef guards were misclassified.
 
 - Verifier-time hardening: BPF programs of unrelated struct_ops
   types (e.g. tcp_congestion_ops) could call sched_ext kfuncs - a
   semantic bug and, once sub-sched was enabled, a KASAN
   out-of-bounds read. Now rejected at load. Plus a few NULL and
   cross-task argument checks on sched_ext kfuncs, and a selftest
   covering the new deny.
 
 - rhashtable (Herbert): restored the insecure_elasticity toggle and
   bounced the deferred-resize kick through irq_work to break a
   lock-order cycle observable from raw-spinlock callers. sched_ext's
   scheduler-instance hash is the first user of both.
 
 - The bypass-mode load balancer used file-scope cpumasks; with
   multiple scheduler instances now possible, those raced. Moved
   per-instance, plus a follow-up to skip tasks whose recorded CPU is
   stale relative to the new owning runqueue.
 
 - Smaller fixes: a dispatch queue's first-task tracking misbehaved
   when a parked iterator cursor sat in the list; the runqueue's
   next-class wasn't promoted on local-queue enqueue, leaving an SCX
   task behind RT in edge cases; the reference qmap scheduler stopped
   erroring on legitimate cross-scheduler task-storage misses.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCafEN/A4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGaydAQCxWrUqnZXhxF4LnztjxTF2tgv7p8P7TbpS6aU6
 etqRpAEA9RFmIXs7XrhwCm0n2BwSgjvrNxnWfPhWvuH0uN0GTAA=
 =wLna
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:
 "The merge window pulled in the cgroup sub-scheduler infrastructure,
  and new AI reviews are accelerating bug reporting and fixing - hence
  the larger than usual fixes batch:

   - Use-after-frees during scheduler load/unload:
       - The disable path could free the BPF scheduler while deferred
         irq_work / kthread work was still in flight
       - cgroup setter callbacks read the active scheduler outside the
         rwsem that synchronizes against teardown
     Fix both, and reuse the disable drain in the enable error paths so
     the BPF JIT page can't be freed under live callbacks.

   - Several BPF op invocations didn't tell the framework which runqueue
     was already locked, so helper kfuncs that re-acquire the runqueue
     by CPU could deadlock on the held lock

     Fix the affected callsites, including recursive parent-into-child
     dispatch.

   - The hardlockup notifier ran from NMI but eventually took a
     non-NMI-safe lock. Bounce it through irq_work.

   - A handful of bugs in the new sub-scheduler hierarchy:
       - helper kfuncs hard-coded the root instead of resolving the
         caller's scheduler
       - the enable error path tried to disable per-task state that had
         never been initialized, and leaked cpus_read_lock on the way
         out
       - a sysfs object was leaked on every load/unload
       - the dispatch fast-path used the root scheduler instead of the
         task's
       - a couple of CONFIG #ifdef guards were misclassified

   - Verifier-time hardening: BPF programs of unrelated struct_ops types
     (e.g. tcp_congestion_ops) could call sched_ext kfuncs - a semantic
     bug and, once sub-sched was enabled, a KASAN out-of-bounds read.
     Now rejected at load. Plus a few NULL and cross-task argument
     checks on sched_ext kfuncs, and a selftest covering the new deny.

   - rhashtable (Herbert): restore the insecure_elasticity toggle and
     bounce the deferred-resize kick through irq_work to break a
     lock-order cycle observable from raw-spinlock callers. sched_ext's
     scheduler-instance hash is the first user of both.

   - The bypass-mode load balancer used file-scope cpumasks; with
     multiple scheduler instances now possible, those raced. Move to
     per-instance cpumasks, plus a follow-up to skip tasks whose
     recorded CPU is stale relative to the new owning runqueue.

   - Smaller fixes:
       - a dispatch queue's first-task tracking misbehaved when a parked
         iterator cursor sat in the list
       - the runqueue's next-class wasn't promoted on local-queue
         enqueue, leaving an SCX task behind RT in edge cases
       - the reference qmap scheduler stopped erroring on legitimate
         cross-scheduler task-storage misses"

* tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (26 commits)
  sched_ext: Fix scx_flush_disable_work() UAF race
  sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
  sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable
  sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime
  sched_ext: Refuse cross-task select_cpu_from_kfunc calls
  sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED
  sched_ext: Make bypass LB cpumasks per-scheduler
  sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before
  sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task
  sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP
  sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail
  sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued()
  sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters
  sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path
  sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
  sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new
  sched_ext: Unregister sub_kset on scheduler disable
  sched_ext: Defer scx_hardlockup() out of NMI
  sched_ext: sync disable_irq_work in bpf_scx_unreg()
  sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched
  ...
2026-04-28 16:26:11 -07:00
..
bpf bpf-fixes 2026-04-17 15:58:22 -07:00
cgroup cgroup: Fixes for v7.1-rc1 2026-04-27 16:51:27 -07:00
configs Remove WARN_ALL_UNSEEDED_RANDOM kernel config option 2026-02-23 11:18:48 -08:00
debug kgdb: update outdated references to kgdb_wait() 2026-04-21 16:41:54 +01:00
dma memblock: updates for 7.0-rc1 2026-04-18 11:29:14 -07:00
entry arm64 updates for 7.1: 2026-04-14 16:48:56 -07:00
events mm.git review status for linus..mm-stable 2026-04-15 12:59:16 -07:00
futex Locking updates for v7.1: 2026-04-14 12:36:25 -07:00
gcov Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
irq genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq() 2026-04-02 23:03:29 +02:00
kcsan kcsan: test: Adjust "expect" allocation type for kmalloc_obj 2026-02-26 09:54:08 -08:00
livepatch Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
liveupdate mm.git review status for linus..mm-stable 2026-04-19 08:01:17 -07:00
locking locking/mutex: Fix ww_mutex wait_list operations 2026-04-23 10:05:49 +02:00
module module: Simplify warning on positive returns from module_init() 2026-04-04 00:04:48 +00:00
power Merge branches 'pm-cpuidle', 'pm-opp' and 'pm-sleep' 2026-04-10 12:37:27 +02:00
printk Merge branch 'rework/prb-fixes' into for-linus 2026-04-20 13:42:01 +02:00
rcu RCU changes for v7.1 2026-04-13 09:36:45 -07:00
sched sched_ext: Fixes for v7.1-rc1 2026-04-28 16:26:11 -07:00
time clockevents: Add missing resets of the next_event_forced flag 2026-04-16 21:22:04 +02:00
trace ring-buffer fix for 7.1: 2026-04-24 15:17:23 -07:00
unwind Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
.gitignore kheaders: rebuild kheaders_data.tar.xz when a file is modified within a minute 2025-06-24 20:30:37 +09:00
acct.c vfs-7.1-rc1.misc 2026-04-13 14:20:11 -07:00
async.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_fsnotify.c audit: widen ino fields to u64 2026-03-06 14:31:26 +01:00
audit_tree.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_watch.c audit: widen ino fields to u64 2026-03-06 14:31:26 +01:00
audit.c audit: handle unknown status requests in audit_receive_msg() 2026-03-10 15:22:43 -04:00
audit.h audit: widen ino fields to u64 2026-03-06 14:31:26 +01:00
auditfilter.c audit: fix coding style issues 2026-03-05 22:16:08 -05:00
auditsc.c audit: widen ino fields to u64 2026-03-06 14:31:26 +01:00
backtracetest.c
bounds.c x86/asm: Remove ANNOTATE_DATA_SPECIAL usage 2025-12-03 16:53:19 +01:00
capability.c
cfi.c cfi: Move BPF CFI types and helpers to generic code 2025-07-31 18:23:53 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() 2026-01-01 16:39:46 +08:00
cpu_pm.c syscore: Pass context data to callbacks 2025-11-14 10:01:52 +01:00
cpu.c SPDX updates for 7.0-rc1 2026-02-17 09:46:03 -08:00
crash_core_test.c crash: add KUnit tests for crash_exclude_mem_range 2025-09-13 17:32:55 -07:00
crash_core.c kernel/crash: remove inclusion of crypto/sha1.h 2026-03-27 21:19:46 -07:00
crash_dump_dm_crypt.c crash_dump/dm-crypt: don't print in arch-specific code 2026-04-02 23:36:24 -07:00
crash_reserve.c kernel/crash: remove inclusion of crypto/sha1.h 2026-03-27 21:19:46 -07:00
cred.c cred: remove unused set_security_override_from_ctx() 2026-01-06 20:52:57 -05:00
delayacct.c delayacct: fix uapi timespec64 definition 2026-02-08 00:13:32 -08:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c mm.git review status for linus..mm-nonmm-stable 2026-04-16 20:11:56 -07:00
exit.h
extable.c
fail_function.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
fork.c mm.git review status for linus..mm-nonmm-stable 2026-04-16 20:11:56 -07:00
freezer.c freezer: Clarify that only cgroup1 freezer uses PM freezer 2025-10-30 20:10:27 +01:00
gen_kheaders.sh kheaders: make it possible to override TAR 2025-08-06 10:23:36 +09:00
groups.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
hung_task.c hung_task: explicitly report I/O wait state in log output 2026-03-27 21:19:40 -07:00
iomem.c
irq_work.c kernel: Use trace_call__##name() at guarded tracepoint call sites 2026-03-26 08:28:49 -04:00
jump_label.c jump_label: use ATOMIC_INIT() for initialization of .enabled 2026-03-16 13:16:48 +01:00
kallsyms_internal.h kallsyms: Get rid of kallsyms relative base 2026-01-22 15:58:22 -07:00
kallsyms_selftest.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kallsyms_selftest.h
kallsyms.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec liveupdate: kho: move to kernel/liveupdate 2025-11-27 14:24:33 -08:00
Kconfig.locks
Kconfig.preempt sched: Further restrict the preemption modes 2026-01-08 12:43:57 +01:00
kcov.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kexec_core.c kernel/kexec: remove inclusion of crypto/hash.h 2026-03-27 21:19:46 -07:00
kexec_elf.c
kexec_file.c kexec: derive purgatory entry from symbol 2026-01-31 16:16:07 -08:00
kexec_internal.h kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
kexec.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kheaders.c
kprobes.c kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() 2026-03-13 23:15:26 +09:00
kstack_erase.c sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument 2025-11-27 15:44:53 +01:00
ksyms_common.c
ksysfs.c kernel: ksysfs: initialize kernel_kobj earlier 2026-04-03 19:39:52 +02:00
kthread.c kthread: consolidate kthread exit paths to prevent use-after-free 2026-02-26 10:45:49 +01:00
latencytop.c
Makefile kcov: Enable context analysis 2026-01-05 16:43:34 +01:00
module_signature.c module: Give 'enum pkey_id_type' a more specific name 2026-03-24 21:42:37 +00:00
notifier.c
nscommon.c nsfs: tighten permission checks for ns iteration ioctls 2026-02-27 22:00:08 +01:00
nsproxy.c vfs-7.1-rc1.mount.v2 2026-04-14 19:59:25 -07:00
nstree.c nstree: tighten permission checks for listing 2026-02-27 22:00:11 +01:00
padata.c padata: Put CPU offline callback in ONLINE section to allow failure 2026-03-22 11:17:59 +09:00
panic.c kernel/panic: mark init_taint_buf as __initdata and panic instead of warning in alloc_taint_buf() 2026-03-27 21:19:33 -07:00
params.c module: Clean up parse_args() arguments 2026-03-18 21:43:18 +00:00
pid_namespace.c pid_namespace: allow opening pid_for_children before init was created 2026-03-20 14:44:26 +01:00
pid_sysctl.h
pid.c mm.git review status for linus..mm-nonmm-stable 2026-04-16 20:11:56 -07:00
profile.c
ptrace.c clone: add CLONE_AUTOREAP 2026-03-11 23:14:02 +01:00
range.c
reboot.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
regset.c
relay.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource_kunit.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource.c PCI: Align head space better 2026-03-27 10:19:08 -05:00
rseq.c rseq: slice ext: Ensure rseq feature size differs from original rseq size 2026-02-23 11:19:19 +01:00
scftorture.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
scs.c scs: fix a wrong parameter in __scs_magic 2025-11-12 10:00:13 -08:00
seccomp.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
signal.c mm.git review status for linus..mm-nonmm-stable 2026-04-16 20:11:56 -07:00
smp.c tracing updates for v7.1: 2026-04-17 09:43:12 -07:00
smpboot.c sched/smp: Use the SMP version of idle_thread_set_boot_cpu() 2025-06-13 08:47:20 +02:00
smpboot.h
softirq.c softirq: Prepare for deferred hrtimer rearming 2026-02-27 16:40:13 +01:00
stacktrace.c
static_call_inline.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
static_call.c
stop_machine.c sched/core: Fix migrate_swap() vs. hotplug 2025-07-01 15:02:03 +02:00
sys_ni.c rseq: Implement sys_rseq_slice_yield() 2026-01-22 11:11:17 +01:00
sys.c prctl: cfi: change the branch landing pad prctl()s to be more descriptive 2026-04-04 18:40:58 -06:00
sysctl-test.c
sysctl.c sysctl: fix uninitialized variable in proc_do_large_bitmap 2026-03-26 09:32:19 +01:00
task_work.c task_work: Fix NMI race condition 2025-10-29 10:29:54 +01:00
taskstats.c taskstats: set version in TGID exit notifications 2026-04-15 02:15:02 -07:00
torture.c torture: Avoid modulo-zero error in torture_hrtimeout_ns() 2026-03-30 15:48:14 -04:00
tracepoint.c tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func() 2026-04-14 05:17:02 -04:00
tsacct.c tsacct: skip all kernel threads 2026-01-26 19:07:13 -08:00
ucount.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
uid16.c
uid16.h
umh.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
up.c
user_namespace.c Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
user-return-notifier.c
user.c ns: drop custom reference count initialization for initial namespaces 2025-11-11 10:01:32 +01:00
utsname_sysctl.c
utsname.c namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
vhost_task.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
vmcore_info.c mm.git review status for linus..mm-nonmm-stable 2026-04-16 20:11:56 -07:00
watch_queue.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
watchdog_buddy.c watchdog/hardlockup: improve buddy system detection timeliness 2026-03-27 21:19:47 -07:00
watchdog_perf.c watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency 2026-02-08 00:13:35 -08:00
watchdog.c watchdog/hardlockup: improve buddy system detection timeliness 2026-03-27 21:19:47 -07:00
workqueue_internal.h workqueue: Show in-flight work item duration in stall diagnostics 2026-03-05 07:27:48 -10:00
workqueue.c workqueue: Changes for v7.1 2026-04-15 10:32:08 -07:00