linux/drivers
Wen Xiong 29aacd438c scsi: ipr: Fix softlockup when rescanning devices in petitboot
[ Upstream commit 394b61711f ]

When trying to rescan disks in petitboot shell, we hit the following
softlockup stacktrace:

Kernel panic - not syncing: System is deadlocked on memory
[  241.223394] CPU: 32 PID: 693 Comm: sh Not tainted 5.4.16-openpower1 #1
[  241.223406] Call Trace:
[  241.223415] [c0000003f07c3180] [c000000000493fc4] dump_stack+0xa4/0xd8 (unreliable)
[  241.223432] [c0000003f07c31c0] [c00000000007d4ac] panic+0x148/0x3cc
[  241.223446] [c0000003f07c3260] [c000000000114b10] out_of_memory+0x468/0x4c4
[  241.223461] [c0000003f07c3300] [c0000000001472b0] __alloc_pages_slowpath+0x594/0x6d8
[  241.223476] [c0000003f07c3420] [c00000000014757c] __alloc_pages_nodemask+0x188/0x1a4
[  241.223492] [c0000003f07c34a0] [c000000000153e10] alloc_pages_current+0xcc/0xd8
[  241.223508] [c0000003f07c34e0] [c0000000001577ac] alloc_slab_page+0x30/0x98
[  241.223524] [c0000003f07c3520] [c0000000001597fc] new_slab+0x138/0x40c
[  241.223538] [c0000003f07c35f0] [c00000000015b204] ___slab_alloc+0x1e4/0x404
[  241.223552] [c0000003f07c36c0] [c00000000015b450] __slab_alloc+0x2c/0x48
[  241.223566] [c0000003f07c36f0] [c00000000015b754] kmem_cache_alloc_node+0x9c/0x1b4
[  241.223582] [c0000003f07c3760] [c000000000218c48] blk_alloc_queue_node+0x34/0x270
[  241.223599] [c0000003f07c37b0] [c000000000226574] blk_mq_init_queue+0x2c/0x78
[  241.223615] [c0000003f07c37e0] [c0000000002ff710] scsi_mq_alloc_queue+0x28/0x70
[  241.223631] [c0000003f07c3810] [c0000000003005b8] scsi_alloc_sdev+0x184/0x264
[  241.223647] [c0000003f07c38a0] [c000000000300ba0] scsi_probe_and_add_lun+0x288/0xa3c
[  241.223663] [c0000003f07c3a00] [c000000000301768] __scsi_scan_target+0xcc/0x478
[  241.223679] [c0000003f07c3b20] [c000000000301c64] scsi_scan_channel.part.9+0x74/0x7c
[  241.223696] [c0000003f07c3b70] [c000000000301df4] scsi_scan_host_selected+0xe0/0x158
[  241.223712] [c0000003f07c3bd0] [c000000000303f04] store_scan+0x104/0x114
[  241.223727] [c0000003f07c3cb0] [c0000000002d5ac4] dev_attr_store+0x30/0x4c
[  241.223741] [c0000003f07c3cd0] [c0000000001dbc34] sysfs_kf_write+0x64/0x78
[  241.223756] [c0000003f07c3cf0] [c0000000001da858] kernfs_fop_write+0x170/0x1b8
[  241.223773] [c0000003f07c3d40] [c0000000001621fc] __vfs_write+0x34/0x60
[  241.223787] [c0000003f07c3d60] [c000000000163c2c] vfs_write+0xa8/0xcc
[  241.223802] [c0000003f07c3db0] [c000000000163df4] ksys_write+0x70/0xbc
[  241.223816] [c0000003f07c3e20] [c00000000000b40c] system_call+0x5c/0x68

As a part of the scan process Linux will allocate and configure a
scsi_device for each target to be scanned. If the device is not present,
then the scsi_device is torn down. As a part of scsi_device teardown a
workqueue item will be scheduled and the lockups we see are because there
are 250k workqueue items to be processed.  Accoding to the specification of
SIS-64 sas controller, max_channel should be decreased on SIS-64 adapters
to 4.

The patch fixes softlockup issue.

Thanks for Oliver Halloran's help with debugging and explanation!

Link: https://lore.kernel.org/r/1583510248-23672-1-git-send-email-wenxiong@linux.vnet.ibm.com
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02 15:28:15 +02:00
..
accessibility
acpi x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
amba
android binder: Handle start==NULL in binder_update_page_range() 2019-12-13 08:52:52 +01:00
ata ata: ahci: Add shutdown to freeze hardware resources of ahci 2020-02-28 16:39:00 +01:00
atm fore200e: Fix incorrect checks of NULL pointer dereference 2020-02-24 08:34:42 +01:00
auxdisplay
base driver core: Fix creation of device links with PM-runtime flags 2020-03-20 11:55:58 +01:00
bcma bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA 2020-01-27 14:51:09 +01:00
block virtio-blk: fix hw_queue stopped on arbitrary error 2020-03-18 07:14:19 +01:00
bluetooth Bluetooth: btusb: fix PM leak in error case of setup 2020-01-09 10:19:04 +01:00
bus bus: ti-sysc: Fix sysc_unprepare() when no clocks have been allocated 2020-01-27 14:50:36 +01:00
cdrom cdrom: respect device capabilities during opening action 2020-01-04 19:13:12 +01:00
char ipmi:ssif: Handle a possible NULL pointer reference 2020-03-05 16:42:12 +01:00
clk clk: uniphier: Add SCSSI clock gate for each channel 2020-02-24 08:34:45 +01:00
clocksource clocksource/drivers/bcm2835_timer: Fix memory leak of timer 2020-02-24 08:34:37 +01:00
connector
cpufreq cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency 2020-01-27 14:50:53 +01:00
cpuidle cpuidle: Do not unset the driver if it is there already 2019-12-17 20:35:00 +01:00
crypto crypto: chtls - Fixed memory leak 2020-02-24 08:34:44 +01:00
dax
dca
devfreq Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs" 2020-03-05 16:42:18 +01:00
dio
dma dmaengine: coh901318: Fix a double lock bug in dma_tc_handle() 2020-03-11 14:15:12 +01:00
dma-buf dma-buf: Fix memory leak in sync_file_merge() 2019-12-21 10:57:38 +01:00
edac EDAC/amd64: Set grain per DIMM 2020-03-11 14:14:45 +01:00
eisa
extcon extcon: sm5502: Reset registers during initialization 2019-12-31 16:35:11 +01:00
firewire net: add annotations on hh->hh_len lockless accesses 2020-01-09 10:19:09 +01:00
firmware efi: Fix debugobjects warning on 'efi_rts_work' 2020-03-20 11:56:00 +01:00
fmc
fpga fpga: altera-ps-spi: Fix getting of optional confd gpio 2019-09-21 07:16:53 +02:00
fsi fsi: sbefifo: Don't fail operations when in SBE IPL state 2020-01-27 14:51:00 +01:00
gnss
gpio gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap() 2020-02-24 08:34:36 +01:00
gpu Revert "drm/dp_mst: Skip validating ports during destruction, just ref" 2020-04-02 15:28:10 +02:00
hid HID: google: add moonball USB id 2020-03-20 11:55:59 +01:00
hsi
hv hv_balloon: Balloon up according to request page number 2020-02-11 04:34:01 -08:00
hwmon hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT() 2020-03-11 14:15:12 +01:00
hwspinlock
hwtracing intel_th: pci: Add Elkhart Lake CPU support 2020-03-25 08:06:11 +01:00
i2c i2c: acpi: put device when verifying client fails 2020-03-18 07:14:25 +01:00
ide ide: serverworks: potential overflow in svwks_set_pio_mode() 2020-02-24 08:34:49 +01:00
idle
iio iio: light: vcnl4000: update sampling periods for vcnl4200 2020-03-25 08:06:13 +01:00
infiniband IB/hfi1, qib: Ensure RCU is locked when accessing list 2020-03-11 14:15:11 +01:00
input Input: edt-ft5x06 - work around first register access error 2020-02-24 08:34:46 +01:00
iommu iommu/vt-d: Ignore devices with out-of-spec domain number 2020-03-18 07:14:24 +01:00
ipack
irqchip irqchip/gic-v3-its: Fix misuse of GENMASK macro 2020-03-05 16:42:12 +01:00
isdn staging: gigaset: add endpoint-type sanity check 2019-12-17 20:34:33 +01:00
leds leds: pca963x: Fix open-drain initialization 2020-02-24 08:34:35 +01:00
lightnvm lightnvm: pblk: fix lock order in pblk_rb_tear_down_check 2020-01-27 14:50:45 +01:00
macintosh macintosh: windfarm: fix MODINFO regression 2020-03-18 07:14:21 +01:00
mailbox mailbox: qcom-apcs: fix max_register value 2020-01-27 14:51:14 +01:00
mcb
md dm integrity: use dm_bio_record and dm_bio_restore 2020-03-25 08:06:07 +01:00
media media: v4l2-mem2mem.c: fix broken links 2020-03-11 14:15:03 +01:00
memory memory: tegra: Don't invoke Tegra30+ specific memory timing setup on Tegra20 2020-01-27 14:50:13 +01:00
memstick memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()' 2019-10-29 09:20:07 +01:00
message scsi: mptfusion: Fix double fetch bug in ioctl 2020-01-23 08:21:28 +01:00
mfd mfd: rn5t618: Mark ADC control register volatile 2020-02-11 04:34:14 -08:00
misc mmc: rtsx_pci: Fix support for speed-modes that relies on tuning 2020-03-25 08:06:11 +01:00
mmc mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY 2020-04-02 15:28:10 +02:00
mtd mtd: sharpslpart: Fix unsigned comparison to zero 2020-02-14 16:33:27 -05:00
mux
net fsl/fman: detect FMan erratum A050385 2020-04-02 15:28:15 +02:00
nfc NFC: fdp: Fix a signedness bug in fdp_nci_send_patch() 2020-04-02 15:28:12 +02:00
ntb ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev() 2020-01-27 14:50:55 +01:00
nubus
nvdimm libnvdimm/btt: fix variable 'rc' set but not used 2020-01-04 19:13:00 +01:00
nvme nvme: Fix uninitialized-variable warning 2020-03-11 14:14:55 +01:00
nvmem nvmem: imx-ocotp: Change TIMING calculation to u-boot algorithm 2020-01-27 14:50:58 +01:00
of drivers/of/of_mdio.c:fix of_mdiobus_register() 2020-04-02 15:28:14 +02:00
opp OPP: Fix missing debugfs supply directory for OPPs 2020-01-27 14:50:04 +01:00
oprofile
parisc parisc: Disable HP HSC-PCI Cards to prevent kernel crash 2019-10-05 13:10:04 +02:00
parport parport: load lowlevel driver if ports not found 2019-12-31 16:36:01 +01:00
pci PCI: Increase D3 delay for AMD Ryzen5/7 XHCI controllers 2020-02-24 08:34:41 +01:00
pcmcia
perf drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer 2020-03-25 08:06:07 +01:00
phy phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval 2020-03-11 14:15:10 +01:00
pinctrl pinctrl: core: Remove extra kref_get which blocks hogs being freed 2020-03-18 07:14:23 +01:00
platform platform/x86: intel_mid_powerbtn: Take a copy of ddata 2020-02-14 16:33:25 -05:00
pnp
power power: supply: ltc2941-battery-gauge: fix use-after-free 2020-02-11 04:34:02 -08:00
powercap
pps
ps3
ptp ptp: free ptp device pin descriptors properly 2020-01-23 08:21:35 +01:00
pwm pwm: omap-dmtimer: put_device() after of_find_device_by_node() 2020-03-05 16:42:22 +01:00
rapidio drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen() 2020-01-27 14:50:31 +01:00
ras
regulator regulator: rk808: Lower log level on optional GPIOs being not available 2020-02-24 08:34:40 +01:00
remoteproc remoteproc: Initialize rproc_class before use 2020-02-24 08:34:50 +01:00
reset reset: uniphier: Add SCSSI reset control for each channel 2020-02-24 08:34:44 +01:00
rpmsg rpmsg: glink: Free pending deferred work on remove 2019-12-21 10:57:30 +01:00
rtc rtc: max8907: add missing select REGMAP_IRQ 2020-03-25 08:06:12 +01:00
s390 s390/qeth: handle error when backing RX buffer 2020-04-02 15:28:15 +02:00
sbus
scsi scsi: ipr: Fix softlockup when rescanning devices in petitboot 2020-04-02 15:28:15 +02:00
sfi
sh
siox
slimbus slimbus: ngd: Fix build error on x86 2019-12-13 08:51:54 +01:00
sn
soc soc/tegra: fuse: Fix build with Tegra194 configuration 2020-03-05 16:42:13 +01:00
soundwire soundwire: intel: fix PDI/stream mapping for Bulk 2019-12-31 16:35:55 +01:00
spi spi/zynqmp: remove entry that causes a cs glitch 2020-03-25 08:06:06 +01:00
spmi
ssb
staging staging: greybus: loopback_test: fix potential path truncations 2020-03-25 08:06:15 +01:00
target scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" 2020-02-28 16:38:58 +01:00
tc
tee tee: optee: Fix compilation issue with nommu 2020-02-05 14:43:50 +00:00
thermal thermal: brcmstb_thermal: Do not use DT coefficients 2020-03-05 16:42:22 +01:00
thunderbolt thunderbolt: Prevent crash if non-active NVMem file is read 2020-02-28 16:38:44 +01:00
tty vt: selection, push sel_lock up 2020-03-11 14:15:03 +01:00
uio uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol() 2020-02-24 08:34:37 +01:00
usb USB: cdc-acm: fix rounding error in TIOCSSERIAL 2020-03-25 08:06:13 +01:00
uwb
vfio vfio/mdev: Fix aborting mdev child device removal if one fails 2020-01-27 14:50:46 +01:00
vhost vhost: Check docket sk_family instead of call getname 2020-03-05 16:42:18 +01:00
video vgacon: Fix a UAF in vgacon_invert_region 2020-03-11 14:15:00 +01:00
virt virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr 2019-11-06 13:06:04 +01:00
virtio virtio_balloon: prevent pfn array overflow 2020-02-24 08:34:54 +01:00
visorbus visorbus: fix uninitialized variable access 2020-02-24 08:34:47 +01:00
vlynq
vme vme: bridges: reduce stack usage 2020-02-24 08:34:47 +01:00
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2019-12-01 09:16:22 +01:00
watchdog watchdog: da9062: do not ping the hw during stop() 2020-03-11 14:14:53 +01:00
xen xenbus: req->err should be updated before req->state 2020-03-25 08:06:08 +01:00
zorro
Kconfig
Makefile