linux/kernel
Waiman Long ba8ba76885 cgroup: Make rebind_subsystems() disable v2 controllers all at once
[ Upstream commit 7ee285395b ]

It was found that the following warning was displayed when remounting
controllers from cgroup v2 to v1:

[ 8042.997778] WARNING: CPU: 88 PID: 80682 at kernel/cgroup/cgroup.c:3130 cgroup_apply_control_disable+0x158/0x190
   :
[ 8043.091109] RIP: 0010:cgroup_apply_control_disable+0x158/0x190
[ 8043.096946] Code: ff f6 45 54 01 74 39 48 8d 7d 10 48 c7 c6 e0 46 5a a4 e8 7b 67 33 00 e9 41 ff ff ff 49 8b 84 24 e8 01 00 00 0f b7 40 08 eb 95 <0f> 0b e9 5f ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3
[ 8043.115692] RSP: 0018:ffffba8a47c23d28 EFLAGS: 00010202
[ 8043.120916] RAX: 0000000000000036 RBX: ffffffffa624ce40 RCX: 000000000000181a
[ 8043.128047] RDX: ffffffffa63c43e0 RSI: ffffffffa63c43e0 RDI: ffff9d7284ee1000
[ 8043.135180] RBP: ffff9d72874c5800 R08: ffffffffa624b090 R09: 0000000000000004
[ 8043.142314] R10: ffffffffa624b080 R11: 0000000000002000 R12: ffff9d7284ee1000
[ 8043.149447] R13: ffff9d7284ee1000 R14: ffffffffa624ce70 R15: ffffffffa6269e20
[ 8043.156576] FS:  00007f7747cff740(0000) GS:ffff9d7a5fc00000(0000) knlGS:0000000000000000
[ 8043.164663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8043.170409] CR2: 00007f7747e96680 CR3: 0000000887d60001 CR4: 00000000007706e0
[ 8043.177539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8043.184673] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8043.191804] PKRU: 55555554
[ 8043.194517] Call Trace:
[ 8043.196970]  rebind_subsystems+0x18c/0x470
[ 8043.201070]  cgroup_setup_root+0x16c/0x2f0
[ 8043.205177]  cgroup1_root_to_use+0x204/0x2a0
[ 8043.209456]  cgroup1_get_tree+0x3e/0x120
[ 8043.213384]  vfs_get_tree+0x22/0xb0
[ 8043.216883]  do_new_mount+0x176/0x2d0
[ 8043.220550]  __x64_sys_mount+0x103/0x140
[ 8043.224474]  do_syscall_64+0x38/0x90
[ 8043.228063]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It was caused by the fact that rebind_subsystem() disables
controllers to be rebound one by one. If more than one disabled
controllers are originally from the default hierarchy, it means that
cgroup_apply_control_disable() will be called multiple times for the
same default hierarchy. A controller may be killed by css_kill() in
the first round. In the second round, the killed controller may not be
completely dead yet leading to the warning.

To avoid this problem, we collect all the ssid's of controllers that
needed to be disabled from the default hierarchy and then disable them
in one go instead of one by one.

Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:02 +01:00
..
bpf bpf: Prevent increasing bpf_jit_limit above max 2021-11-18 14:03:42 +01:00
cgroup cgroup: Make rebind_subsystems() disable v2 controllers all at once 2021-11-18 14:04:02 +01:00
configs
debug kgdb: fix to kill breakpoints on initmem after boot 2021-03-04 11:38:46 +01:00
dma dma-debug: fix sg checks in debug_dma_map_sg() 2021-10-27 09:56:47 +02:00
entry KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-10-06 15:55:49 +02:00
events perf/core: fix userpage->time_enabled of inactive events 2021-10-17 10:43:33 +02:00
gcov gcov: re-fix clang-11+ support 2021-04-14 08:41:58 +02:00
irq genirq/timings: Fix error return code in irq_timings_test_irqs() 2021-09-15 09:50:29 +02:00
kcsan kcsan: Fix debugfs initcall return type 2021-05-26 12:06:54 +02:00
livepatch kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
locking lockdep: Let lock_is_held_type() detect recursive read as read 2021-11-18 14:04:02 +01:00
power PM: hibernate: Get block device exclusively in swsusp_check() 2021-11-18 14:03:59 +01:00
printk printk/console: Allow to disable console output by using console="" or console=null 2021-11-12 14:58:33 +01:00
rcu rcu: Fix existing exp request check in sync_sched_exp_online_cleanup() 2021-11-18 14:04:02 +01:00
sched sched/scs: Reset the shadow stack when idle_task_exit 2021-10-27 09:56:55 +02:00
time Revert "posix-cpu-timers: Force next expiration recalc after itimer reset" 2021-09-16 12:51:23 +02:00
trace tracing/cfi: Fix cmp_entries_* functions signature mismatch 2021-11-18 14:03:58 +01:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c kernel: acct.c: fix some kernel-doc nits 2020-10-16 11:11:19 -07:00
async.c
audit_fsnotify.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:09:31 +02:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit.c
audit.h
auditfilter.c
auditsc.c audit: fix possible null-pointer dereference in audit_filter_rules 2021-10-27 09:56:52 +02:00
backtracetest.c
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c cpu/hotplug: Cure the cpusets trainwreck 2021-07-19 09:44:59 +02:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2021-01-30 13:55:18 +01:00
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-19 11:58:16 -08:00
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-18 13:40:36 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
futex.c mm, futex: fix shared futex pgoff on shmem huge page 2021-06-30 08:47:29 -04:00
gen_kheaders.sh
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-19 10:13:09 +02:00
kexec_internal.h
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-03-04 11:38:35 +01:00
ksysfs.c
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-03 10:09:31 +02:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c module: limit enabling module.sig_enforce 2021-06-30 08:47:15 -04:00
notifier.c
nsproxy.c
padata.c
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-18 13:40:36 +02:00
pid.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-26 14:08:58 +02:00
ptrace.c ptrace: make ptrace() fail if the tracee changed its pid unexpectedly 2021-05-26 12:06:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c
relay.c kernel/relay.c: drop unneeded initialization 2020-10-16 11:11:22 -07:00
resource.c kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources 2021-05-19 10:13:09 +02:00
rseq.c KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-10-06 15:55:49 +02:00
scftorture.c
scs.c
seccomp.c seccomp: Fix setting loaded filter count during TSYNC 2021-08-18 08:59:06 +02:00
signal.c signal: Remove the bogus sigkill_pending in ptrace_stop 2021-11-18 14:03:47 +01:00
smp.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API 2020-10-18 09:27:10 -07:00
sys.c prctl: allow to setup brk for et_dyn executables 2021-09-26 14:08:57 +02:00
sysctl-test.c
sysctl.c sysctl.c: fix underflow value setting risk in vm_table 2021-03-17 17:06:25 +01:00
task_work.c task_work: cleanup notification modes 2020-10-17 15:05:30 -06:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c
ucount.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c
watchdog_hld.c
watchdog.c watchdog: fix barriers when printing backtraces from all CPUs 2021-05-19 10:13:00 +02:00
workqueue_internal.h
workqueue.c workqueue: make sysfs of unbound kworker cpumask more clever 2021-11-18 14:03:58 +01:00