linux/drivers
Jens Remus 3679713402 scsi: zfcp: fix infinite iteration on ERP ready list
commit fa89adba19 upstream.

zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
processed asynchronously and concurrently.

Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
and then iterates with list_for_each() over the adapter's ERP ready list
without holding the ERP lock. This opens a race window in which the
current list entry can be moved to another list, causing list_for_each()
to iterate forever on the wrong list, as the erp_ready_head is never
encountered as terminal condition.

Meanwhile the ERP action can be processed in the ERP thread by
zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
lock and then calls zfcp_erp_action_to_running() to move the ERP action
from the ready to the running list.  zfcp_erp_action_to_running() can
move the ERP action using list_move() just during the aforementioned
race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
held by the infinitely looping kworker, it effectively spins forever.

Example Sequence Diagram:

Process                ERP Thread             rport_work
-------------------    -------------------    -------------------
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
zfcp_scsi_schedule_rports_block()
lock ERP                                      zfcp_scsi_rport_work()
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
list_add_tail() on ready                      !(rport_task==RPORT_ADD)
wake_up() ERP thread                          zfcp_scsi_rport_block()
zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
unlock ERP                                    lock DBF REC
zfcp_erp_wait()        lock ERP
|                      zfcp_erp_action_to_running()
|                                             list_for_each() ready
|                      list_move()              current entry
|                        ready to running
|                      zfcp_dbf_rec_run()       endless loop over running
|                      zfcp_dbf_rec_run_lvl()
|                      lock DBF REC spins forever

Any adapter recovery can trigger this, such as setting the device offline
or reboot.

V4.9 commit 4eeaa4f3f1 ("zfcp: close window with unblocked rport
during rport gone") introduced additional tracing of (un)blocking of
rports. It missed that the adapter->erp_lock must be held when calling
zfcp_dbf_rec_trig().

This fix uses the approach formerly introduced by commit aa0fec6239
("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
later removed by commit ae0904f60f ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.").

Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
acquires and releases the adapter->erp_lock for read.

Reported-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Fixes: 4eeaa4f3f1 ("zfcp: close window with unblocked rport during rport gone")
Cc: <stable@vger.kernel.org> # 2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26 08:49:00 +02:00
..
accessibility
acpi ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E 2018-04-24 09:32:06 +02:00
amba ARM: amba: Don't read past the end of sysfs "driver_override" buffer 2018-05-02 07:53:42 -07:00
android binder: add missing binder_unlock() 2018-02-28 10:17:23 +01:00
ata libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs 2018-05-16 10:06:51 +02:00
atm atm: zatm: Fix potential Spectre v1 2018-05-16 10:06:52 +02:00
auxdisplay
base regmap: Fix reversed bounds check in regmap_raw_write() 2018-04-24 09:32:06 +02:00
bcma
block block/loop: fix deadlock after loop_set_status 2018-04-24 09:32:03 +02:00
bluetooth Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" 2018-05-16 10:06:52 +02:00
bus bus: brcmstb_gisb: correct support for 64-bit address output 2018-04-13 19:50:05 +02:00
cdrom cdrom: information leak in cdrom_ioctl_media_changed() 2018-04-29 07:50:07 +02:00
char virtio_console: free buffers after reset 2018-05-02 07:53:40 -07:00
clk clk: bcm2835: De-assert/assert PLL reset signal when appropriate 2018-04-24 09:32:08 +02:00
clocksource clockevents/drivers/cs5535: Improve resilience to spurious interrupts 2017-10-27 10:23:17 +02:00
connector
cpufreq cpufreq: intel_pstate: Enable HWP by default 2018-05-26 08:48:54 +02:00
cpuidle cpuidle: coupled: remove unused define cpuidle_coupled_lock 2018-05-26 08:48:53 +02:00
crypto crypto: s5p-sss - Fix kernel Oops in AES-ECB mode 2018-02-25 11:03:55 +01:00
dca
devfreq PM / devfreq: Propagate error from devfreq_add_device() 2018-02-22 15:44:58 +01:00
dio
dma dmaengine: at_xdmac: fix rare residue corruption 2018-04-24 09:32:08 +02:00
dma-buf
edac EDAC, mv64x60: Fix an error handling path 2018-04-13 19:50:23 +02:00
eisa
extcon extcon: palmas: Check the parent instance to prevent the NULL 2017-11-21 09:21:18 +01:00
firewire
firmware efi/esrt: Cleanup bad memory map log messages 2017-12-20 10:04:56 +01:00
fmc
fpga
gpio gpio: label descriptors using the device name 2018-04-13 19:50:14 +02:00
gpu drm/vmwgfx: Fix a buffer object leak 2018-05-16 10:06:48 +02:00
hid HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device 2018-04-24 09:32:10 +02:00
hsi HSI: ssi_protocol: double free in ssip_pn_xmit() 2018-03-24 10:58:42 +01:00
hv Drivers: hv: vmbus: fix build warning 2018-02-25 11:03:46 +01:00
hwmon hwmon: (ina2xx) Fix access to uninitialized mutex 2018-04-24 09:32:04 +02:00
hwspinlock
hwtracing coresight: Fix disabling of CoreSight TPIU 2018-03-24 10:58:48 +01:00
i2c i2c: i2c-scmi: add a MS HID 2018-03-24 10:58:41 +01:00
ide
idle idle: i7300: add PCI dependency 2018-02-25 11:03:51 +01:00
iio iio: magnetometer: st_magn_spi: fix spi_device_id table 2018-04-13 19:50:21 +02:00
infiniband IB/mlx5: Use unlimited rate when static rate is not supported 2018-05-16 10:06:48 +02:00
input Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro 2018-05-16 10:06:48 +02:00
iommu iommu/vt-d: Fix a potential memory leak 2018-04-24 09:32:08 +02:00
ipack
irqchip irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis 2018-03-22 09:23:31 +01:00
isdn mISDN: Fix a sleep-in-atomic bug 2018-04-13 19:50:16 +02:00
leds leds: pca955x: Correct I2C Functionality 2018-04-13 19:50:09 +02:00
lguest
lightnvm
macintosh
mailbox mailbox: handle empty message in tx_tick 2017-08-06 19:19:41 -07:00
mcb
md bcache: segregate flash only volume write streams 2018-04-13 19:50:22 +02:00
media media: v4l2-compat-ioctl32: don't oops on overlay 2018-04-24 09:32:03 +02:00
memory ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure 2017-12-16 10:33:51 +01:00
memstick
message scsi: mptsas: Disable WRITE SAME 2018-04-29 07:50:06 +02:00
mfd mfd: palmas: Reset the POWERHOLD mux during power off 2018-03-24 10:58:44 +01:00
misc drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests 2018-04-13 19:50:02 +02:00
mmc mmc: jz4740: Fix race condition in IRQ mask update 2018-04-24 09:32:08 +02:00
mtd gpmi-nand: Handle ECC Errors in erased pages 2018-05-16 10:06:47 +02:00
net bonding: do not allow rlb updates to invalid mac 2018-05-26 08:48:48 +02:00
nfc NFC: nfcmrvl: double free on error path 2018-03-22 09:23:23 +01:00
ntb ntb_transport: fix bug calculating num_qps_mw 2017-08-30 10:19:29 +02:00
nubus
nvdimm libnvdimm, namespace: make 'resource' attribute only readable by root 2017-11-30 08:37:23 +00:00
nvme nvme: Fix managing degraded controllers 2018-02-16 20:09:47 +01:00
nvmem nvmem: imx-ocotp: Fix wrong register size 2017-08-06 19:19:46 -07:00
of of: fix of_device_get_modalias returned length when truncating buffers 2018-03-22 09:23:21 +01:00
oprofile
parisc parisc: Hide Diva-built-in serial aux and graphics card 2018-01-02 20:33:20 +01:00
parport parport_pc: Add support for WCH CH382L PCI-E single parallel port card. 2018-04-08 11:52:00 +02:00
pci ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status() 2018-04-24 09:32:06 +02:00
pcmcia
perf drivers/perf: arm_pmu: handle no platform_device 2018-03-22 09:23:26 +01:00
phy phy: work around 'phys' references to usb-nop-xceiv devices 2018-01-23 19:50:16 +01:00
pinctrl pinctrl: Really force states during suspend/resume 2018-03-24 10:58:48 +01:00
platform platform/chrome: Use proper protocol transfer function 2018-03-24 10:58:47 +01:00
pnp
power power: supply: pda_power: move from timer to delayed_work 2018-03-24 10:58:45 +01:00
powercap PowerCap: Fix an error code in powercap_register_zone() 2018-04-13 19:50:05 +02:00
pps
ps3
ptp time: Change posix clocks ops interfaces to use timespec64 2018-03-24 10:58:40 +01:00
pwm pwm: tegra: Increase precision in PWM rate calculation 2018-03-22 09:23:27 +01:00
rapidio
ras
regulator regulator: anatop: set default voltage selector for pcie 2018-03-24 10:58:40 +01:00
remoteproc
reset
rpmsg
rtc rtc: interface: Validate alarm-time before handling rollover 2018-04-13 19:50:15 +02:00
s390 scsi: zfcp: fix infinite iteration on ERP ready list 2018-05-26 08:49:00 +02:00
sbus
scsi scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() 2018-05-26 08:49:00 +02:00
sfi
sh
sn
soc
spi spi: pxa2xx: Allow 64-bit DMA 2018-05-26 08:48:52 +02:00
spmi spmi: Include OF based modalias in device uevent 2017-07-27 15:06:10 -07:00
ssb ssb: mark ssb_bus_register as __maybe_unused 2018-02-25 11:03:44 +01:00
staging staging: ion : Donnot wakeup kswapd in ion system alloc 2018-04-29 07:50:01 +02:00
target tcm_fileio: Prevent information leak for short reads 2018-03-24 10:58:45 +01:00
tc
thermal thermal: imx: Fix race condition in imx_thermal_probe() 2018-04-24 09:32:08 +02:00
thunderbolt thunderbolt: Resume control channel after hibernation image is created 2018-04-24 09:32:07 +02:00
tty serial: mctrl_gpio: Add missing module license 2018-05-02 07:53:43 -07:00
uio
usb usbip: usbip_host: fix bad unlock balance during stub_probe() 2018-05-26 08:48:52 +02:00
uwb uwb: ensure that endpoint is interrupt 2017-10-12 11:27:35 +02:00
vfio vfio/pci: Virtualize Maximum Read Request Size 2018-04-24 09:32:09 +02:00
vhost vhost: correctly remove wait queue during poll failure 2018-04-13 19:50:25 +02:00
video vfb: fix video mode and line_length being set when loaded 2018-04-13 19:50:13 +02:00
virt
virtio virtio_balloon: prevent uninitialized variable use 2018-02-25 11:03:42 +01:00
vlynq
vme
w1
watchdog watchdog: f71808e_wdt: Fix WD_EN register read 2018-04-24 09:32:08 +02:00
xen xen/gntdev: Fix partial gntdev_mmap() cleanup 2018-03-03 10:19:45 +01:00
zorro
Kconfig
Makefile usb: build drivers/usb/common/ when USB_SUPPORT is set 2018-02-25 11:03:38 +01:00