linux/drivers/scsi
Brian King 764907293e scsi: ibmvfc: Set default timeout to avoid crash during migration
While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:02:59 -05:00
..
aacraid scsi: aacraid: Fix fall-through warnings for Clang 2020-12-02 12:59:46 -05:00
aic7xxx scsi: aic7xxx: Fix fall-through warnings for Clang 2020-12-02 12:59:46 -05:00
aic94xx scsi: aic94xx: Fix fall-through warnings for Clang 2020-12-02 12:59:46 -05:00
arcmsr scsi: arcmsr: Use generic power management 2020-11-25 23:14:30 -05:00
arm SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
be2iscsi SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
bfa scsi: bfa: Fix fall-through warnings for Clang 2020-12-02 12:59:46 -05:00
bnx2fc SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
bnx2i scsi: bnx2i: Requires MMU 2020-12-02 12:59:04 -05:00
csiostor scsi: csiostor: Fix fall-through warnings for Clang 2020-12-02 12:59:47 -05:00
cxgbi scsi: cxgb4i: Fix TLS dependency 2020-12-09 12:14:41 -05:00
cxlflash ocxl: Update the Process Element Entry 2020-12-04 01:01:30 +11:00
device_handler SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
dpt
esas2r scsi: esas2r: Use generic power management 2020-11-25 23:14:31 -05:00
fcoe SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
fnic scsi: fnic: Fix memleak in vnic_dev_init_devcmd2 2021-01-12 23:32:53 -05:00
hisi_sas Merge branch '5.11/scsi-postmerge' into 5.11/scsi-fixes 2021-01-04 13:27:39 -05:00
ibmvscsi scsi: ibmvfc: Set default timeout to avoid crash during migration 2021-01-14 22:02:59 -05:00
ibmvscsi_tgt
isci scsi: isci: Don't use PCI helper functions 2020-11-10 23:08:36 -05:00
libfc scsi: libfc: Avoid invoking response handler twice if ep is already completed 2021-01-12 23:07:32 -05:00
libsas SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
lpfc scsi: lpfc: Fix fall-through warnings for Clang 2020-12-02 12:59:47 -05:00
megaraid scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression 2021-01-07 22:26:00 -05:00
mpt3sas scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility" 2021-01-05 23:25:07 -05:00
mvsas
pcmcia scsi: Remove unneeded break statements 2020-10-26 18:23:24 -04:00
pm8001 scsi: pm80xx: Fix error return in pm8001_pci_probe() 2020-12-07 17:35:10 -05:00
qedf scsi: libfc: Move scsi/fc_encode.h to libfc 2020-10-29 21:49:25 -04:00
qedi scsi: qedi: Correct max length of CHAP secret 2021-01-05 23:22:50 -05:00
qla2xxx SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
qla4xxx scsi: qla4xxx: Remove redundant assignment to variable rval 2020-12-09 11:34:17 -05:00
smartpqi scsi: smartpqi: Update version to 1.2.16-012 2020-11-16 23:03:10 -05:00
snic scsi: snic: Simplify the return expression of svnic_cq_alloc() 2020-10-07 23:50:03 -04:00
sym53c8xx_2 scsi: Remove unneeded break statements 2020-10-26 18:23:24 -04:00
ufs scsi: ufs: Fix tm request when non-fatal error happens 2021-01-07 22:50:48 -05:00
.gitignore
3w-9xxx.c scsi: 3w-9xxx: Use generic power management 2020-11-25 23:23:21 -05:00
3w-9xxx.h
3w-sas.c scsi: 3w-sas: Use generic power management 2020-11-25 23:23:21 -05:00
3w-sas.h
3w-xxxx.c
3w-xxxx.h
53c700_d.h_shipped
53c700.c SCSI misc on 20201023 2020-10-23 16:19:02 -07:00
53c700.h 53c700: improve non-coherent DMA handling 2020-09-25 06:20:43 +02:00
53c700.scr
a100u2w.c
a100u2w.h
a2091.c
a2091.h
a3000.c
a3000.h
a4000t.c
advansys.c scsi: advansys: Relocate or remove unused variables 2020-11-10 22:27:47 -05:00
aha152x.c
aha152x.h
aha1542.c
aha1542.h
aha1740.c scsi: aha1740: Fix fall-through warnings for Clang 2020-12-02 12:59:46 -05:00
aha1740.h
am53c974.c
atari_scsi.c scsi: atari_scsi: Fix race condition between .queuecommand and EH 2020-11-23 22:12:09 -05:00
atp870u.c
atp870u.h
BusLogic.c
BusLogic.h
bvme6000_scsi.c
ch.c
constants.c
dc395x.c scsi: dc395x: Mark 's_stat2' as __maybe_unused 2020-11-10 22:27:47 -05:00
dc395x.h
dmx3191d.c
dpt_i2o.c
dpti.h
esp_scsi.c
esp_scsi.h
fdomain_isa.c
fdomain_pci.c
fdomain.c
fdomain.h
FlashPoint.c
g_NCR5380.c scsi: NCR5380: Remove context check 2020-12-07 20:24:09 -05:00
gdth_ioctl.h
gdth_proc.c
gdth_proc.h
gdth.c scsi: gdth: Make option_setup() static 2020-10-07 21:48:28 -04:00
gdth.h
gvp11.c
gvp11.h
hosts.c scsi: Add host and host template flag 'host_tagset' 2020-10-06 08:33:44 -06:00
hpsa_cmd.h
hpsa.c SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
hpsa.h
hptiop.c scsi: Remove unneeded break statements 2020-10-26 18:23:24 -04:00
hptiop.h
imm.c
imm.h
initio.c scsi: initio: Use module_pci_driver() to simplify the code 2020-10-07 21:48:28 -04:00
initio.h
ipr.c scsi: Remove unneeded break statements 2020-10-26 18:23:24 -04:00
ipr.h
ips.c
ips.h
iscsi_boot_sysfs.c
iscsi_tcp.c scsi: doc: Fix some kernel-doc markups 2020-10-26 21:54:16 -04:00
iscsi_tcp.h
jazz_esp.c scsi: jazz_esp: Use module_platform_driver to simplify the code 2020-10-02 21:52:52 -04:00
Kconfig
lasi700.c
libiscsi_tcp.c scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() 2020-10-02 15:27:08 -07:00
libiscsi.c SCSI misc on 20201216 2020-12-16 13:34:31 -08:00
mac_esp.c scsi: mac_esp: Use module_platform_driver to simplify the code 2020-10-02 21:52:53 -04:00
mac_scsi.c scsi: NCR5380: Remove context check 2020-12-07 20:24:09 -05:00
mac53c94.c
mac53c94.h
Makefile
megaraid.c SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
megaraid.h
mesh.c
mesh.h
mvme16x_scsi.c
mvme147.c
mvme147.h
mvumi.c scsi: mvumi: Update function description 2020-11-25 23:23:22 -05:00
mvumi.h
myrb.c scsi: myrb: Remove WARN_ON(in_interrupt()) 2020-12-01 00:03:53 -05:00
myrb.h
myrs.c scsi: myrs: Remove WARN_ON(in_interrupt()) 2020-12-01 00:03:53 -05:00
myrs.h
ncr53c8xx.c
ncr53c8xx.h
NCR5380.c scsi: NCR5380: Remove context check 2020-12-07 20:24:09 -05:00
NCR5380.h scsi: NCR5380: Remove context check 2020-12-07 20:24:09 -05:00
nsp32_debug.c
nsp32_io.h
nsp32.c scsi: nsp32: Remove unneeded semicolon 2020-09-15 17:34:18 -04:00
nsp32.h
pmcraid.c scsi: pmcraid: Use generic power management 2020-11-25 23:23:22 -05:00
pmcraid.h
ppa.c
ppa.h
ps3rom.c powerpc/ps3: make system bus's remove and shutdown callbacks return void 2020-12-04 01:01:22 +11:00
qla1280.c
qla1280.h
qlogicfas.c
qlogicfas408.c
qlogicfas408.h
qlogicpti.c SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
qlogicpti.h
raid_class.c
script_asm.pl
scsi_common.c
scsi_debug.c scsi: scsi_debug: Fix memleak in scsi_debug_init() 2021-01-05 23:28:11 -05:00
scsi_debugfs.c
scsi_debugfs.h
scsi_devinfo.c scsi: doc: Fix some kernel-doc markups 2020-10-26 21:54:16 -04:00
scsi_dh.c
scsi_error.c SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
scsi_ioctl.c
scsi_lib_dma.c
scsi_lib.c SCSI fixes on 20210101 2021-01-01 12:58:07 -08:00
scsi_logging.c
scsi_logging.h
scsi_netlink.c
scsi_pm.c
scsi_priv.h scsi: core: Add limitless cmd retry support 2020-10-02 18:53:06 -04:00
scsi_proc.c
scsi_sas_internal.h
scsi_scan.c scsi: core: Don't start concurrent async scan on same host 2020-10-26 16:05:34 -04:00
scsi_sysctl.c
scsi_sysfs.c scsi: core: Fix -Wformat for scsi_host 2020-11-16 22:33:59 -05:00
scsi_trace.c
scsi_transport_api.h
scsi_transport_fc.c scsi: doc: Fix some kernel-doc markups 2020-10-26 21:54:16 -04:00
scsi_transport_iscsi.c scsi: iscsi: Fix inappropriate use of put_device() 2020-12-07 17:45:19 -05:00
scsi_transport_sas.c
scsi_transport_spi.c scsi: scsi_transport_spi: Set RQF_PM for domain validation commands 2020-12-09 11:41:42 -05:00
scsi_transport_srp.c scsi: scsi_transport_srp: Don't block target in failfast state 2021-01-12 22:56:49 -05:00
scsi.c
scsi.h
scsicam.c block: remove ->bd_contains 2020-12-01 14:53:39 -07:00
sd_dif.c
sd_zbc.c scsi: sd: sd_zbc: Fix ZBC disk initialization 2020-09-15 20:08:15 -04:00
sd.c scsi: sd: Remove obsolete variable in sd_remove() 2021-01-05 23:46:33 -05:00
sd.h SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
sense_codes.h scsi: core: Update additional sense codes list 2020-09-15 20:28:06 -04:00
ses.c
sg.c iov_iter: transparently handle compat iovecs in import_iovec 2020-10-03 00:02:13 -04:00
sgiwd93.c sgiwd93: convert to dma_alloc_noncoherent 2020-09-25 06:20:44 +02:00
sim710.c
sni_53c710.c scsi: sni_53c710: Use module_platform_driver to simplify the code 2020-10-02 21:52:54 -04:00
sr_ioctl.c sr: Switch the sector size back to 2048 if sr_read_sector() changed it. 2020-12-12 11:12:25 -07:00
sr_vendor.c
sr.c sr: Remove in_interrupt() usage in sr_init_command(). 2020-12-12 11:12:25 -07:00
sr.h
st_options.h
st.c scsi: Remove unneeded break statements 2020-10-26 18:23:24 -04:00
st.h
stex.c scsi: stex: Fix fall-through warnings for Clang 2020-12-02 12:59:47 -05:00
storvsc_drv.c hyperv-next for 5.11 2020-12-16 11:49:46 -08:00
sun_esp.c scsi: sun_esp: Use module_platform_driver to simplify the code 2020-10-02 21:52:55 -04:00
sun3_scsi_vme.c
sun3_scsi.c
sun3x_esp.c scsi: sun3x_esp: Use module_platform_driver to simplify the code 2020-10-02 21:52:55 -04:00
virtio_scsi.c SCSI misc on 20201013 2020-10-14 15:15:35 -07:00
vmw_pvscsi.c
vmw_pvscsi.h
wd33c93.c
wd33c93.h
wd719x.c
wd719x.h
xen-scsifront.c
zalon.c
zorro_esp.c
zorro7xx.c