Linux kernel source tree
Go to file
David Christensen 994087b56e net/tg3: resolve deadlock in tg3_reset_task() during EEH
[ Upstream commit 6c4ca03bd8 ]

During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:

crash> foreach UN bt
...
PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
...
 #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
 #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...

PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
...
 #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c00000003721fc60] process_one_work at c00000000019e5c4
...

PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
...
 #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c000000037247c60] process_one_work at c00000000019e5c4
...

PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
...:1

 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...

Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks.  If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.

Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:

crash> foreach UN bt
PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
...
 #4 [c000000006867a60] rtnl_lock at c000000000c940d8
 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
 #6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
...
 #3 [c000000036c1bb70] msleep at c000000000259e6c
 #4 [c000000036c1bba0] napi_disable at c000000000c6b848
 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
 #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...

This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.

Fixes: db84bf43ef ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01 08:34:48 +01:00
arch riscv: Move call to init_cpu_topology() to later initialization stage 2023-02-01 08:34:48 +01:00
block block: mq-deadline: Rename deadline_is_seq_writes() 2023-01-24 07:24:44 +01:00
certs certs: make system keyring depend on built-in x509 parser 2022-09-24 04:31:18 +09:00
crypto crypto: tcrypt - Fix multibuffer skcipher speed test mem leak 2022-12-31 13:32:34 +01:00
Documentation regulator: dt-bindings: samsung,s2mps14: add lost samsung,ext-control-gpios 2023-02-01 08:34:39 +01:00
drivers net/tg3: resolve deadlock in tg3_reset_task() during EEH 2023-02-01 08:34:48 +01:00
fs ovl: fail on invalid uid/gid mapping at copy up 2023-02-01 08:34:38 +01:00
include platform/x86: apple-gmux: Add apple_gmux_detect() helper 2023-02-01 08:34:46 +01:00
init gcc: disable -Warray-bounds for gcc-11 too 2023-01-14 10:33:43 +01:00
io_uring io_uring: always prep_async for drain requests 2023-02-01 08:34:42 +01:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel tracing/osnoise: Use built-in RCU list checking 2023-02-01 08:34:46 +01:00
lib netlink: prevent potential spectre v1 gadgets 2023-02-01 08:34:43 +01:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm panic: Consolidate open-coded panic_on_warn checks 2023-01-24 07:24:41 +01:00
net net: mctp: mark socks as dead on unhash, prevent re-add 2023-02-01 08:34:48 +01:00
rust Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
samples ftrace: Export ftrace_free_filter() to modules 2023-02-01 08:34:37 +01:00
scripts ftrace/scripts: Update the instructions for ftrace-bisect.sh 2023-02-01 08:34:37 +01:00
security tomoyo: fix broken dependency on *.conf.default 2023-02-01 08:34:06 +01:00
sound ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets 2023-02-01 08:34:31 +01:00
tools Revert "selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID" 2023-02-01 08:34:34 +01:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt kvm/vfio: Fix potential deadlock on vfio group_lock 2023-02-01 08:34:36 +01:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS panic: Expose "warn_count" to sysfs 2023-01-24 07:24:41 +01:00
Makefile kbuild: fix 'make modules' error when CONFIG_DEBUG_INFO_BTF_MODULES=y 2023-02-01 08:34:08 +01:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.