linux/kernel
Cong Wang 0505cc4c90 cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
[ Upstream commit ad0f75e5f5 ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:32:00 +02:00
..
bpf bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() 2020-07-16 08:17:27 +02:00
cgroup cgroup: fix cgroup_sk_alloc() for sk_clone_lock() 2020-07-22 09:32:00 +02:00
configs
debug kgdb: Avoid suspicious RCU usage warning 2020-07-09 09:37:10 +02:00
dma dma-debug: add a schedule point in debug_dma_dump_mappings() 2020-01-04 19:12:43 +01:00
events perf: Add cond_resched() to task_function_call() 2020-06-22 09:05:08 +02:00
gcov kernel/gcov/fs.c: gcov_seq_next() should increase position index 2020-04-29 16:31:12 +02:00
irq genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy() 2020-04-17 10:48:41 +02:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:57:10 +02:00
locking locktorture: Print ratio of acquisitions, not failures 2020-04-23 10:30:23 +02:00
power PM: hibernate: Freeze kernel threads in software_resume() 2020-05-06 08:13:28 +02:00
printk printk: fix exclusive_console replaying 2020-02-11 04:33:51 -08:00
rcu rcu: Avoid data-race in rcu_gp_fqs_check_wake() 2020-02-11 04:33:55 -08:00
sched sched/core: Fix PI boosting between RT and DEADLINE tasks 2020-06-30 23:17:13 -04:00
time clocksource: Prevent double add_timer_on() for watchdog_timer 2020-02-11 04:34:18 -08:00
trace ring-buffer: Zero out time extend if it is nested and not absolute 2020-06-30 23:17:17 -04:00
.gitignore
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c
audit_fsnotify.c
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-12-13 08:51:02 +01:00
audit.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c
capability.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
compat.c make 'user_access_begin()' do 'access_ok()' 2020-06-22 09:04:58 +02:00
configs.c
context_tracking.c
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-22 09:05:28 +02:00
cpu.c sched/core: Fix illegal RCU from offline CPUs 2020-06-22 09:05:14 +02:00
crash_core.c
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:19:00 +01:00
delayacct.c
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:21:23 +02:00
exec_domain.c
exit.c exit: Move preemption fixup up, move blocking operations down 2020-06-22 09:05:14 +02:00
extable.c
fail_function.c
fork.c fork,memcg: alloc_thread_stack_node needs to set tsk->stack 2020-01-27 14:50:58 +01:00
freezer.c
futex.c futex: Unbreak futex hashing 2020-03-25 08:06:14 +01:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-16 08:17:26 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:20:29 +01:00
kexec_file.c
kexec_internal.h
kexec.c
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-17 10:48:52 +02:00
kprobes.c kprobes: Do not expose probe addresses to non-CAP_SYSLOG 2020-07-16 08:17:27 +02:00
ksysfs.c
kthread.c
latencytop.c
Makefile y2038: futex: Move compat implementation into futex.c 2019-12-01 09:17:38 +01:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
module_signing.c
module-internal.h
module.c module: Do not expose section addresses to non-CAP_SYSLOG 2020-07-16 08:17:26 +02:00
notifier.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
nsproxy.c
padata.c padata: purge get_cpu and reorder_via_wq from padata_do_serial 2020-05-27 17:37:36 +02:00
panic.c kernel/panic.c: do not append newline to the stack protector panic string 2019-12-01 09:17:10 +01:00
params.c
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-26 09:14:01 +02:00
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
profile.c
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-23 08:21:29 +01:00
range.c
reboot.c
relay.c kernel/relay.c: handle alloc_percpu returning NULL in relay_open 2020-06-07 13:17:54 +02:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c
seccomp.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
signal.c signal: Extend exec_id to 64bits 2020-04-17 10:48:47 +02:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c
smpboot.h
softirq.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:01 +02:00
sysctl_binary.c
sysctl.c kernel: sysctl: make drop_caches write-only 2020-01-04 19:13:17 +01:00
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:18:59 +01:00
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c coredump: fix crash when umh is disabled 2020-05-14 07:57:21 +02:00
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-02-24 08:34:49 +01:00
workqueue_internal.h
workqueue.c workqueue: don't use wq_select_unbound_cpu() for bound works 2020-03-18 07:14:20 +01:00