linux/kernel
Waiman Long 26d3bf38ae cgroup: Make rebind_subsystems() disable v2 controllers all at once
[ Upstream commit 7ee285395b ]

It was found that the following warning was displayed when remounting
controllers from cgroup v2 to v1:

[ 8042.997778] WARNING: CPU: 88 PID: 80682 at kernel/cgroup/cgroup.c:3130 cgroup_apply_control_disable+0x158/0x190
   :
[ 8043.091109] RIP: 0010:cgroup_apply_control_disable+0x158/0x190
[ 8043.096946] Code: ff f6 45 54 01 74 39 48 8d 7d 10 48 c7 c6 e0 46 5a a4 e8 7b 67 33 00 e9 41 ff ff ff 49 8b 84 24 e8 01 00 00 0f b7 40 08 eb 95 <0f> 0b e9 5f ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3
[ 8043.115692] RSP: 0018:ffffba8a47c23d28 EFLAGS: 00010202
[ 8043.120916] RAX: 0000000000000036 RBX: ffffffffa624ce40 RCX: 000000000000181a
[ 8043.128047] RDX: ffffffffa63c43e0 RSI: ffffffffa63c43e0 RDI: ffff9d7284ee1000
[ 8043.135180] RBP: ffff9d72874c5800 R08: ffffffffa624b090 R09: 0000000000000004
[ 8043.142314] R10: ffffffffa624b080 R11: 0000000000002000 R12: ffff9d7284ee1000
[ 8043.149447] R13: ffff9d7284ee1000 R14: ffffffffa624ce70 R15: ffffffffa6269e20
[ 8043.156576] FS:  00007f7747cff740(0000) GS:ffff9d7a5fc00000(0000) knlGS:0000000000000000
[ 8043.164663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8043.170409] CR2: 00007f7747e96680 CR3: 0000000887d60001 CR4: 00000000007706e0
[ 8043.177539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8043.184673] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8043.191804] PKRU: 55555554
[ 8043.194517] Call Trace:
[ 8043.196970]  rebind_subsystems+0x18c/0x470
[ 8043.201070]  cgroup_setup_root+0x16c/0x2f0
[ 8043.205177]  cgroup1_root_to_use+0x204/0x2a0
[ 8043.209456]  cgroup1_get_tree+0x3e/0x120
[ 8043.213384]  vfs_get_tree+0x22/0xb0
[ 8043.216883]  do_new_mount+0x176/0x2d0
[ 8043.220550]  __x64_sys_mount+0x103/0x140
[ 8043.224474]  do_syscall_64+0x38/0x90
[ 8043.228063]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It was caused by the fact that rebind_subsystem() disables
controllers to be rebound one by one. If more than one disabled
controllers are originally from the default hierarchy, it means that
cgroup_apply_control_disable() will be called multiple times for the
same default hierarchy. A controller may be killed by css_kill() in
the first round. In the second round, the killed controller may not be
completely dead yet leading to the warning.

To avoid this problem, we collect all the ssid's of controllers that
needed to be disabled from the default hierarchy and then disable them
in one go instead of one by one.

Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:24 +01:00
..
bpf bpf: Fix potential race in tail call compatibility check 2021-10-26 12:37:28 -07:00
cgroup cgroup: Make rebind_subsystems() disable v2 controllers all at once 2021-11-18 19:16:24 +01:00
configs
debug kgdb patches for 5.15 2021-09-07 12:08:04 -07:00
dma dma-mapping fixes for Linux 5.15 2021-10-20 10:16:51 -10:00
entry entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume() 2021-09-22 10:24:01 -04:00
events perf/core: fix userpage->time_enabled of inactive events 2021-10-01 13:57:54 +02:00
gcov
irq irqchip fixes for 5.15, take #1 2021-09-24 14:11:04 +02:00
kcsan LKMM updates: 2021-09-02 13:00:15 -07:00
livepatch livepatch: Replace deprecated CPU-hotplug functions. 2021-08-19 12:00:24 +02:00
locking lockdep: Let lock_is_held_type() detect recursive read as read 2021-11-18 19:16:23 +01:00
power PM: hibernate: Get block device exclusively in swsusp_check() 2021-11-18 19:16:18 +01:00
printk memblock: introduce saner 'memblock_free_ptr()' interface 2021-09-14 13:23:22 -07:00
rcu rcu: Fix existing exp request check in sync_sched_exp_online_cleanup() 2021-11-18 19:16:23 +01:00
sched sched/scs: Reset the shadow stack when idle_task_exit 2021-10-19 17:46:11 +02:00
time posix-cpu-timers: Prevent spuriously armed 0-value itimer 2021-09-23 11:53:51 +02:00
trace ftrace: do CPU checking after preemption disabled 2021-11-18 19:16:20 +01:00
.gitignore
acct.c kernel/acct.c: use dedicated helper to access rlimit values 2021-09-08 11:50:26 -07:00
async.c
audit_fsnotify.c
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c audit: fix possible null-pointer dereference in audit_filter_rules 2021-10-18 18:27:47 -04:00
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu_pm.c
cpu.c
crash_core.c
crash_dump.c
cred.c ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring 2021-10-20 10:34:20 -05:00
delayacct.c
dma.c
exec_domain.c
exit.c io_uring: remove files pointer in cancellation functions 2021-08-23 13:10:37 -06:00
extable.c
fail_function.c
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-08 18:45:53 -07:00
freezer.c
futex.c futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() 2021-09-03 23:00:22 +02:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c
kexec_internal.h
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kheaders.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
Makefile
module_signature.c
module_signing.c
module-internal.h
module.c module: fix clang CFI with MODULE_UNLOAD=n 2021-09-28 12:56:26 +02:00
notifier.c
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
pid.c
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c
range.c
reboot.c
regset.c
relay.c
resource_kunit.c
resource.c
rseq.c KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-09-22 10:24:01 -04:00
scftorture.c
scs.c
seccomp.c Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-09-01 14:52:05 -07:00
signal.c signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed 2021-11-18 19:16:01 +01:00
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
stop_machine.c
sys_ni.c compat: remove some compat entry points 2021-09-08 15:32:35 -07:00
sys.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
sysctl-test.c
sysctl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c ucounts: Fix signal ucount refcounting 2021-10-18 16:02:30 -05:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c workqueue: make sysfs of unbound kworker cpumask more clever 2021-11-18 19:16:17 +01:00