linux/drivers
Tsuchiya Yuto 36b3ba412d mwifiex: pcie: skip cancel_work_sync() on reset failure path
[ Upstream commit 4add4d988f ]

If a reset is performed, but even the reset fails for some reasons (e.g.,
on Surface devices, the fw reset requires another quirks),
cancel_work_sync() hangs in mwifiex_cleanup_pcie().

    # firmware went into a bad state
    [...]
    [ 1608.281690] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
    [ 1608.282724] mwifiex_pcie 0000:03:00.0: rx_pending=0, tx_pending=1,	cmd_pending=0
    [ 1608.292400] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    [ 1608.292405] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    # reset performed after firmware went into a bad state
    [ 1609.394320] mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
    [ 1609.394335] mwifiex_pcie 0000:03:00.0: WLAN FW is active
    # but even the reset failed
    [ 1619.499049] mwifiex_pcie 0000:03:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0xe000
    [ 1619.499094] mwifiex_pcie 0000:03:00.0: num_data_h2c_failure = 0
    [ 1619.499103] mwifiex_pcie 0000:03:00.0: num_cmd_h2c_failure = 0
    [ 1619.499110] mwifiex_pcie 0000:03:00.0: is_cmd_timedout = 1
    [ 1619.499117] mwifiex_pcie 0000:03:00.0: num_tx_timeout = 0
    [ 1619.499124] mwifiex_pcie 0000:03:00.0: last_cmd_index = 0
    [ 1619.499133] mwifiex_pcie 0000:03:00.0: last_cmd_id: fa 00 07 01 07 01 07 01 07 01
    [ 1619.499140] mwifiex_pcie 0000:03:00.0: last_cmd_act: 00 e0 00 00 00 00 00 00 00 00
    [ 1619.499147] mwifiex_pcie 0000:03:00.0: last_cmd_resp_index = 3
    [ 1619.499155] mwifiex_pcie 0000:03:00.0: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
    [ 1619.499162] mwifiex_pcie 0000:03:00.0: last_event_index = 2
    [ 1619.499169] mwifiex_pcie 0000:03:00.0: last_event: 58 00 58 00 58 00 58 00 58 00
    [ 1619.499177] mwifiex_pcie 0000:03:00.0: data_sent=0 cmd_sent=1
    [ 1619.499185] mwifiex_pcie 0000:03:00.0: ps_mode=0 ps_state=0
    [ 1619.499215] mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
    # mwifiex_pcie_work hang happening
    [ 1823.233923] INFO: task kworker/3:1:44 blocked for more than 122 seconds.
    [ 1823.233932]       Tainted: G        WC OE     5.10.0-rc1-1-mainline #1
    [ 1823.233935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1823.233940] task:kworker/3:1     state:D stack:    0 pid:   44 ppid:     2 flags:0x00004000
    [ 1823.233960] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
    [ 1823.233965] Call Trace:
    [ 1823.233981]  __schedule+0x292/0x820
    [ 1823.233990]  schedule+0x45/0xe0
    [ 1823.233995]  schedule_timeout+0x11c/0x160
    [ 1823.234003]  wait_for_completion+0x9e/0x100
    [ 1823.234012]  __flush_work.isra.0+0x156/0x210
    [ 1823.234018]  ? flush_workqueue_prep_pwqs+0x130/0x130
    [ 1823.234026]  __cancel_work_timer+0x11e/0x1a0
    [ 1823.234035]  mwifiex_cleanup_pcie+0x28/0xd0 [mwifiex_pcie]
    [ 1823.234049]  mwifiex_free_adapter+0x24/0xe0 [mwifiex]
    [ 1823.234060]  _mwifiex_fw_dpc+0x294/0x560 [mwifiex]
    [ 1823.234074]  mwifiex_reinit_sw+0x15d/0x300 [mwifiex]
    [ 1823.234080]  mwifiex_pcie_reset_done+0x50/0x80 [mwifiex_pcie]
    [ 1823.234087]  pci_try_reset_function+0x5c/0x90
    [ 1823.234094]  process_one_work+0x1d6/0x3a0
    [ 1823.234100]  worker_thread+0x4d/0x3d0
    [ 1823.234107]  ? rescuer_thread+0x410/0x410
    [ 1823.234112]  kthread+0x142/0x160
    [ 1823.234117]  ? __kthread_bind_mask+0x60/0x60
    [ 1823.234124]  ret_from_fork+0x22/0x30
    [...]

This is a deadlock caused by calling cancel_work_sync() in
mwifiex_cleanup_pcie():

- Device resets are done via mwifiex_pcie_card_reset()
- which schedules card->work to call mwifiex_pcie_card_reset_work()
- which calls pci_try_reset_function().
- This leads to mwifiex_pcie_reset_done() be called on the same workqueue,
  which in turn calls
- mwifiex_reinit_sw() and that calls
- _mwifiex_fw_dpc().

The problem is now that _mwifiex_fw_dpc() calls mwifiex_free_adapter()
in case firmware initialization fails. That ends up calling
mwifiex_cleanup_pcie().

Note that all those calls are still running on the workqueue. So when
mwifiex_cleanup_pcie() now calls cancel_work_sync(), it's really waiting
on itself to complete, causing a deadlock.

This commit fixes the deadlock by skipping cancel_work_sync() on a reset
failure path.

After this commit, when reset fails, the following output is
expected to be shown:

    kernel: mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
    kernel: mwifiex: Failed to bring up adapter: -5
    kernel: mwifiex_pcie 0000:03:00.0: reinit failed: -5

To reproduce this issue, for example, try putting the root port of wifi
into D3 (replace "00:1d.3" with your setup).

    # put into D3 (root port)
    sudo setpci -v -s 00:1d.3 CAP_PM+4.b=0b

Cc: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201028142346.18355-1-kitakar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-11 14:17:23 +01:00
..
accessibility speakup: fix uninitialized flush_lock 2020-12-30 11:53:44 +01:00
acpi ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling 2021-03-11 14:17:21 +01:00
amba amba: Fix resource leak for drivers without .remove 2021-03-04 11:38:02 +01:00
android binder: add flag to clear buffer on txn complete 2020-12-30 11:54:09 +01:00
ata ata: ahci_brcm: Add back regulators management 2021-03-04 11:37:45 +01:00
atm atm: idt77252: call pci_disable_device() on error path 2021-01-12 20:18:09 +01:00
auxdisplay auxdisplay: ht16k33: Fix refresh rate handling 2021-03-04 11:38:00 +01:00
base PM: runtime: Update device status before letting suppliers suspend 2021-03-09 11:11:12 +01:00
bcma
block rsxx: Return -EFAULT if copy_to_user() fails 2021-03-09 11:11:15 +01:00
bluetooth Bluetooth: btqca: Add valid le states quirk 2021-03-11 14:17:22 +01:00
bus bus: fsl-mc: fix error return code in fsl_mc_object_allocate() 2020-12-30 11:53:46 +01:00
cdrom
char tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality() 2021-03-09 11:11:10 +01:00
clk clk: aspeed: Fix APLL calculate formula from ast2600-A2 2021-03-04 11:38:06 +01:00
clocksource clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined 2021-03-04 11:37:57 +01:00
connector
counter counter:ti-eqep: remove floor 2021-01-27 11:55:12 +01:00
cpufreq cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available 2021-03-04 11:38:42 +01:00
cpuidle
crypto crypto: sun4i-ss - initialize need_fallback 2021-03-04 11:38:32 +01:00
dax device-dax: Fix default return code of range_parse() 2021-03-04 11:38:15 +01:00
dca
devfreq
dio
dma dmaengine: idxd: set DMA channel to be private 2021-03-04 11:37:57 +01:00
dma-buf dmabuf: fix use-after-free of dmabuf's file->f_inode 2021-01-12 20:18:24 +01:00
edac EDAC/amd64: Do not load on family 0x15, model 0x13 2021-03-07 12:34:08 +01:00
eisa
extcon extcon: max77693: Fix modalias string 2020-12-30 11:53:49 +01:00
firewire
firmware firmware: arm_scmi: Fix call site of scmi_notification_exit 2021-03-04 11:37:23 +01:00
fpga fpga: Specify HAS_IOMEM dependency for FPGA_DFL 2020-12-01 18:46:24 +01:00
fsi fsi: Aspeed: Add mutex to protect HW access 2020-12-30 11:53:46 +01:00
gnss
gpio gpio: pcf857x: Fix missing first interrupt 2021-03-04 11:38:40 +01:00
gpu drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie 2021-03-09 11:11:13 +01:00
greybus
hid HID: wacom: Ignore attempts to overwrite the touch_max value from HID 2021-03-04 11:38:23 +01:00
hsi HSI: Fix PM usage counter unbalance in ssi_hw_init 2021-03-04 11:37:52 +01:00
hv Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind() 2021-03-04 11:37:46 +01:00
hwmon hwmon: (dell-smm) Add XPS 15 L502X to fan control blacklist 2021-02-26 10:13:00 +01:00
hwspinlock
hwtracing coresight: etm4x: Handle accesses to TRCSTALLCTLR 2021-03-04 11:38:37 +01:00
i2c i2c: exynos5: Preserve high speed master code 2021-03-04 11:38:20 +01:00
i3c i3c master: fix missing destroy_workqueue() on error in i3c_master_register 2021-01-06 14:56:53 +01:00
ide ide/falconide: Fix module unload 2021-03-04 11:38:21 +01:00
idle intel_idle: Build fix 2020-12-03 10:00:23 +01:00
iio iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free() 2021-01-27 11:55:12 +01:00
infiniband IB/mlx5: Add missing error code 2021-03-09 11:11:14 +01:00
input Input: elan_i2c - add new trackpoint report type 0x5F 2021-03-07 12:34:04 +01:00
interconnect interconnect: imx8mq: Use icc_sync_state 2021-01-27 11:55:29 +01:00
iommu iommu/amd: Fix sleeping in atomic in increase_address_space() 2021-03-11 14:17:22 +01:00
ipack
irqchip irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap 2021-03-04 11:38:42 +01:00
isdn misdn: dsp: select CONFIG_BITREVERSE 2021-01-19 18:27:26 +01:00
leds leds: trigger: fix potential deadlock with libata 2021-02-03 23:28:41 +01:00
lightnvm lightnvm: fix memory leak when submit fails 2021-01-27 11:55:22 +01:00
macintosh macintosh/adb-iop: Use big-endian autopoll mask 2021-03-04 11:37:42 +01:00
mailbox mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL 2021-03-04 11:38:15 +01:00
mcb
md dm verity: fix FEC for RS roots unaligned to block size 2021-03-09 11:11:12 +01:00
media media: v4l: ioctl: Fix memory leak in video_usercopy 2021-03-07 12:34:16 +01:00
memory memory: ti-aemif: Drop child node when jumping out loop 2021-03-04 11:37:25 +01:00
memstick memstick: r592: Fix error return in r592_probe() 2020-12-30 11:53:34 +01:00
message
mfd mfd: gateworks-gsc: Fix interrupt type 2021-03-04 11:38:40 +01:00
misc mei: me: add adler lake point LP DID 2021-03-04 11:38:40 +01:00
mmc mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure 2021-03-04 11:38:39 +01:00
most
mtd mtd: spi-nor: hisi-sfc: Put child node np on error path 2021-03-04 11:38:37 +01:00
mux
net mwifiex: pcie: skip cancel_work_sync() on reset failure path 2021-03-11 14:17:23 +01:00
nfc nfc: s3fwrn5: Release the nfc firmware 2020-12-30 11:53:53 +01:00
ntb
nubus
nvdimm libnvdimm/dimm: Avoid race between probe and available_slots_show() 2021-02-10 09:29:17 +01:00
nvme nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state 2021-03-11 14:17:21 +01:00
nvmem nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer 2021-03-04 11:38:39 +01:00
of of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS 2021-03-09 11:11:15 +01:00
opp opp: Correct debug message in _opp_add_static_v2() 2021-03-04 11:37:27 +01:00
oprofile
parisc
parport
pci PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse 2021-03-07 12:34:11 +01:00
pcmcia
perf perf/arm-cmn: Move IRQs when migrating context 2021-03-04 11:37:44 +01:00
phy phy: lantiq: rcu-usb2: wait after clock enable 2021-03-04 11:38:24 +01:00
pinctrl pinctrl: qcom: Don't clear pending interrupts when enabling 2021-01-27 11:55:27 +01:00
platform platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask 2021-03-04 11:37:58 +01:00
pnp
power power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable 2021-03-04 11:37:59 +01:00
powercap
pps
ps3 powerpc/ps3: use dma_mapping_error() 2020-12-30 11:53:53 +01:00
ptp phy: dp83640: select CONFIG_CRC32 2021-01-17 14:17:02 +01:00
pwm pwm: iqs620a: Fix overflow and optimize calculations 2021-03-04 11:38:17 +01:00
rapidio
ras
regulator regulator: bd718x7, bd71828, Fix dvs voltage levels 2021-03-04 11:38:07 +01:00
remoteproc remoteproc/mediatek: Fix kernel test robot warning 2021-03-07 12:34:15 +01:00
reset
rpmsg
rtc rtc: zynqmp: depend on HAS_IOMEM 2021-03-04 11:38:03 +01:00
s390 virtio/s390: implement virtio-ccw revision 2 correctly 2021-03-04 11:38:42 +01:00
sbus
scsi scsi: iscsi: Verify lengths on passthrough PDUs 2021-03-07 12:34:14 +01:00
sfi
sh
siox
slimbus slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew() 2020-12-30 11:53:47 +01:00
soc soc: samsung: exynos-asv: handle reading revision register error 2021-03-04 11:38:32 +01:00
soundwire soundwire: intel: fix possible crash when no device is detected 2021-03-04 11:38:22 +01:00
spi spi: spi-synquacer: fix set_cs handling 2021-03-04 11:38:43 +01:00
spmi spmi: spmi-pmic-arb: Fix hw_irq overflow 2021-03-04 11:38:40 +01:00
ssb
staging staging: bcm2835-audio: Replace unsafe strcpy() with strscpy() 2021-03-07 12:34:10 +01:00
target cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds 2021-03-04 11:37:34 +01:00
tc
tee optee: simplify i2c access 2021-03-04 11:37:28 +01:00
thermal thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error 2021-03-04 11:38:41 +01:00
thunderbolt thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link() 2021-02-10 09:29:15 +01:00
tty tty: teach the n_tty ICANON case about the new "cookie continuations" too 2021-03-07 12:34:16 +01:00
uio
usb USB: serial: mos7720: fix error code in mos7720_write() 2021-03-04 11:38:24 +01:00
vdpa vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() 2021-03-04 11:37:17 +01:00
vfio vfio/type1: Use follow_pte() 2021-03-04 11:38:17 +01:00
vhost vhost_net: fix ubuf refcount incorrectly when sendmsg fails 2021-01-12 20:18:13 +01:00
video udlfb: Fix memory leak in dlfb_usb_probe 2021-03-07 12:34:04 +01:00
virt virt: vbox: Do not use wait_event_interruptible when called from kernel context 2021-03-04 11:37:18 +01:00
virtio virtio_ring: Fix two use after free bugs 2020-12-30 11:54:00 +01:00
visorbus
vlynq
vme
w1 w1: w1_therm: Fix conversion result for negative temperatures 2021-03-04 11:37:18 +01:00
watchdog watchdog: mei_wdt: request stop on unregister 2021-03-04 11:38:36 +01:00
xen xen-scsiback: don't "handle" error by BUG() 2021-02-23 15:53:24 +01:00
zorro
Kconfig
Makefile vdpa: mlx5: fix vdpa/vhost dependencies 2020-12-02 04:09:56 -05:00