Linux kernel source tree
Go to file
Song Liu a7f4da875c bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
[ Upstream commit eac9153f2b ]

bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
 try_to_wake_up+0x1ad/0x590
 wake_up_q+0x54/0x80
 rwsem_wake+0x8a/0xb0
 bpf_get_stack+0x13c/0x150
 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
 bpf_overflow_handler+0x60/0x100
 __perf_event_overflow+0x4f/0xf0
 perf_swevent_overflow+0x99/0xc0
 ___perf_sw_event+0xe7/0x120
 __schedule+0x47d/0x620
 schedule+0x29/0x90
 futex_wait_queue_me+0xb9/0x110
 futex_wait+0x139/0x230
 do_futex+0x2ac/0xa50
 __x64_sys_futex+0x13c/0x180
 do_syscall_64+0x42/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can be reproduced by:
1. Start a multi-thread program that does parallel mmap() and malloc();
2. taskset the program to 2 CPUs;
3. Attach bpf program to trace_sched_switch and gather stackmap with
   build-id, e.g. with trace.py from bcc tools:
   trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch

A sample reproducer is attached at the end.

This could also trigger deadlock with other locks that are nested with
rq_lock.

Fix this by checking whether irqs are disabled. Since rq_lock and all
other nested locks are irq safe, it is safe to do up_read() when irqs are
not disable. If the irqs are disabled, postpone up_read() in irq_work.

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

Reproducer:
============================ 8< ============================

char *filename;

void *worker(void *p)
{
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
}

int main(int argc, char *argv[])
{
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
}
============================ 8< ============================

Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:35:20 +01:00
arch x86/mm: Use the correct function type for native_set_fixmap() 2019-12-31 16:35:12 +01:00
block blk-mq: make sure that line break can be printed 2019-12-17 20:35:48 +01:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: user - fix memory leak in crypto_report 2019-12-13 08:52:48 +01:00
Documentation mm, thp, proc: report THP eligibility for each vma 2019-12-17 20:35:45 +01:00
drivers iio: adc: max1027: Reset the device at probe time 2019-12-31 16:35:17 +01:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues 2019-12-31 16:34:44 +01:00
include block: Fix writeback throttling W=1 compiler warnings 2019-12-31 16:34:59 +01:00
init fork: fix some -Wmissing-prototypes warnings 2019-12-05 09:21:04 +01:00
ipc ipc/mqueue.c: only perform resource calculation if user valid 2019-08-06 19:06:52 +02:00
kernel bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() 2019-12-31 16:35:20 +01:00
lib idr: Fix idr_get_next_ul race with idr_remove 2019-12-17 20:36:02 +01:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm mm, thp, proc: report THP eligibility for each vma 2019-12-17 20:35:45 +01:00
net Bluetooth: hci_core: fix init for HCI_USER_CHANNEL 2019-12-31 16:35:19 +01:00
samples samples: pktgen: fix proc_cmd command result check logic 2019-12-31 16:34:59 +01:00
scripts modpost: skip ELF local symbols during section mismatch check 2019-12-13 08:52:20 +01:00
security apparmor: delete the dentry in aafs_remove() to avoid a leak 2019-12-05 09:21:19 +01:00
sound ALSA: hda/ca0132 - Fix work handling in delayed HP detection 2019-12-31 16:34:47 +01:00
tools perf report: Add warning when libunwind not compiled in 2019-12-31 16:35:15 +01:00
usr kbuild: clean compressed initramfs image 2019-10-07 18:57:16 +02:00
virt KVM: arm/arm64: vgic: Don't rely on the wrong pending table 2019-12-13 08:52:45 +01:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS USB: rio500: Remove Rio 500 kernel driver 2019-10-17 13:44:47 -07:00
Makefile Linux 4.19.91 2019-12-21 10:57:45 +01:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.