linux/drivers
Douglas Anderson 7396cb30cb dm bufio: avoid sleeping while holding the dm_bufio lock
commit 9ea61cac0b upstream.

We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:

  mutex_lock+0x4c/0x68
  dm_bufio_shrink_count+0x38/0x78
  shrink_slab.part.54.constprop.65+0x100/0x464
  shrink_zone+0xa8/0x198

In the two cases analyzed, we see one task that looks like this:

  Workqueue: kverityd verity_prefetch_io

  __switch_to+0x9c/0xa8
  __schedule+0x440/0x6d8
  schedule+0x94/0xb4
  schedule_timeout+0x204/0x27c
  schedule_timeout_uninterruptible+0x44/0x50
  wait_iff_congested+0x9c/0x1f0
  shrink_inactive_list+0x3a0/0x4cc
  shrink_lruvec+0x418/0x5cc
  shrink_zone+0x88/0x198
  try_to_free_pages+0x51c/0x588
  __alloc_pages_nodemask+0x648/0xa88
  __get_free_pages+0x34/0x7c
  alloc_buffer+0xa4/0x144
  __bufio_new+0x84/0x278
  dm_bufio_prefetch+0x9c/0x154
  verity_prefetch_io+0xe8/0x10c
  process_one_work+0x240/0x424
  worker_thread+0x2fc/0x424
  kthread+0x10c/0x114

...and that looks to be the one holding the mutex.

The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
     http://crbug.com/468342
   ...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
     nice ./launchBalloons.sh 4 900 100000
   ...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs

With that, I've seen printouts like:
  DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().

The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock.  Instead we should be using
GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.

The current behavior explained by David Rientjes:

  It will still try reclaim initially because __GFP_WAIT (or
  __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
  contention on dm_bufio_lock() that the thread holds.  You want to
  pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
  mutex that can be contended by a concurrent slab shrinker (if
  count_objects didn't use a trylock, this pattern would trivially
  deadlock).

This change significantly increases responsiveness of the system while
in this state.  It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:

   kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
   Call trace:
   [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
   [<ffffffc00090b794>] __schedule+0x440/0x6d8
   [<ffffffc00090bac0>] schedule+0x94/0xb4
   [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
   [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
   [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
   [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
   [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
   [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
   [<ffffffc00030e578>] balance_pgdat+0x328/0x508
   [<ffffffc00030eb7c>] kswapd+0x424/0x51c
   [<ffffffc00023f06c>] kthread+0x10c/0x114
   [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

By unblocking kswapd memory pressure should be reduced.

Suggested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11 16:03:51 +02:00
..
accessibility
acpi ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c 2018-05-30 07:49:11 +02:00
amba ARM: amba: Don't read past the end of sysfs "driver_override" buffer 2018-05-02 07:53:42 -07:00
android binder: add missing binder_unlock() 2018-02-28 10:17:23 +01:00
ata libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk 2018-07-03 11:21:26 +02:00
atm atm: zatm: fix memcmp casting 2018-07-03 11:21:24 +02:00
auxdisplay
base driver core: Don't ignore class_dir_create_and_add() failure. 2018-07-03 11:21:25 +02:00
bcma
block drbd: fix access after free 2018-07-11 16:03:48 +02:00
bluetooth Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader 2018-07-03 11:21:28 +02:00
bus bus: brcmstb_gisb: correct support for 64-bit address output 2018-04-13 19:50:05 +02:00
cdrom cdrom: do not call check_disk_change() inside cdrom_open() 2018-05-30 07:49:13 +02:00
char ipmi:bt: Set the timeout before doing a capabilities check 2018-07-03 11:21:28 +02:00
clk clk: samsung: exynos3250: Fix PLL rates 2018-05-30 07:49:16 +02:00
clocksource clocksource/drivers/fsl_ftm_timer: Fix error return checking 2018-05-30 07:49:01 +02:00
connector
cpufreq cpufreq: Fix new policy initialization during limits updates via sysfs 2018-07-03 11:21:26 +02:00
cpuidle cpuidle: powernv: Fix promotion from snooze if next state disabled 2018-07-03 11:21:29 +02:00
crypto crypto: vmx - Remove overly verbose printk from AES init routines 2018-06-16 09:54:27 +02:00
dca
devfreq PM / devfreq: Propagate error from devfreq_add_device() 2018-02-22 15:44:58 +01:00
dio
dma dmaengine: usb-dmac: fix endless loop in usb_dmac_chan_terminate_all() 2018-06-06 16:46:22 +02:00
dma-buf
edac EDAC, mv64x60: Fix an error handling path 2018-04-13 19:50:23 +02:00
eisa
extcon extcon: palmas: Check the parent instance to prevent the NULL 2017-11-21 09:21:18 +01:00
firewire firewire-ohci: work around oversized DMA reads on JMicron controllers 2018-05-30 07:48:52 +02:00
firmware firmware: dmi_scan: Fix handling of empty DMI strings 2018-05-30 07:48:56 +02:00
fmc
fpga
gpio gpio: No NULL owner 2018-06-16 09:54:26 +02:00
gpu drm: set FMODE_UNSIGNED_OFFSET for drm files 2018-06-13 16:15:27 +02:00
hid HID: debug: check length before copy_to_user() 2018-07-11 16:03:50 +02:00
hsi HSI: ssi_protocol: double free in ssip_pn_xmit() 2018-03-24 10:58:42 +01:00
hv Drivers: hv: vmbus: fix build warning 2018-02-25 11:03:46 +01:00
hwmon hwmon: (pmbus/adm1275) Accept negative page register values 2018-05-30 07:49:13 +02:00
hwspinlock
hwtracing hwtracing: stm: fix build error on some arches 2018-06-06 16:46:23 +02:00
i2c i2c: rcar: fix resume by always initializing registers before transfer 2018-07-11 16:03:47 +02:00
ide cdrom: do not call check_disk_change() inside cdrom_open() 2018-05-30 07:49:13 +02:00
idle idle: i7300: add PCI dependency 2018-02-25 11:03:51 +01:00
iio iio:buffer: make length types match kfifo types 2018-07-03 11:21:30 +02:00
infiniband RDMA/mlx4: Discard unknown SQP work requests 2018-07-03 11:21:29 +02:00
input Input: elantech - fix V4 report decoding for module with middle key 2018-07-03 11:21:34 +02:00
iommu x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros 2018-06-16 09:54:24 +02:00
ipack
irqchip irqchip/gic-v3: Change pr_debug message to pr_devel 2018-05-30 07:48:57 +02:00
isdn isdn: eicon: fix a missing-check bug 2018-06-13 16:15:28 +02:00
leds leds: pca955x: Correct I2C Functionality 2018-04-13 19:50:09 +02:00
lguest
lightnvm
macintosh
mailbox mailbox: handle empty message in tx_tick 2017-08-06 19:19:41 -07:00
mcb
md dm bufio: avoid sleeping while holding the dm_bufio lock 2018-07-11 16:03:51 +02:00
media media: cx25840: Use subdev host data for PLL override 2018-07-11 16:03:51 +02:00
memory ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure 2017-12-16 10:33:51 +01:00
memstick
message scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo() 2018-05-30 07:48:58 +02:00
mfd mfd: intel-lpss: Program REMAP register in PIO mode 2018-07-03 11:21:32 +02:00
misc vmw_balloon: fixing double free when batching mode is off 2018-06-16 09:54:26 +02:00
mmc mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register 2018-05-30 07:48:51 +02:00
mtd ubi: fastmap: Correctly handle interrupted erasures in EBA 2018-07-11 16:03:47 +02:00
net ath10k: fix rfc1042 header retrieval in QCA4019 with eth decap mode 2018-07-11 16:03:47 +02:00
nfc NFC: nfcmrvl: double free on error path 2018-03-22 09:23:23 +01:00
ntb ntb_transport: Fix bug with max_mw_size parameter 2018-05-30 07:48:55 +02:00
nubus
nvdimm linvdimm, pmem: Preserve read-only setting for pmem devices 2018-07-03 11:21:31 +02:00
nvme nvme-pci: initialize queue memory before interrupts 2018-07-11 16:03:47 +02:00
nvmem nvmem: imx-ocotp: Fix wrong register size 2017-08-06 19:19:46 -07:00
of of: unittest: for strings, account for trailing \0 in property length field 2018-07-03 11:21:29 +02:00
oprofile
parisc parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode 2018-05-30 07:49:10 +02:00
parport parport_pc: Add support for WCH CH382L PCI-E single parallel port card. 2018-04-08 11:52:00 +02:00
pci PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume 2018-07-03 11:21:30 +02:00
pcmcia
perf drivers/perf: arm_pmu: handle no platform_device 2018-03-22 09:23:26 +01:00
phy phy: work around 'phys' references to usb-nop-xceiv devices 2018-01-23 19:50:16 +01:00
pinctrl pinctrl: Really force states during suspend/resume 2018-03-24 10:58:48 +01:00
platform platform/chrome: Use proper protocol transfer function 2018-03-24 10:58:47 +01:00
pnp
power power: supply: pda_power: move from timer to delayed_work 2018-03-24 10:58:45 +01:00
powercap PowerCap: Fix an error code in powercap_register_zone() 2018-04-13 19:50:05 +02:00
pps
ps3
ptp time: Change posix clocks ops interfaces to use timespec64 2018-03-24 10:58:40 +01:00
pwm pwm: tegra: Increase precision in PWM rate calculation 2018-03-22 09:23:27 +01:00
rapidio
ras
regulator regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()' 2018-05-30 07:49:17 +02:00
remoteproc
reset
rpmsg
rtc rtc: tx4939: avoid unintended sign extension on a 24 bit shift 2018-05-30 07:49:14 +02:00
s390 scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread 2018-07-03 11:21:31 +02:00
sbus
scsi scsi: sg: mitigate read/write abuse 2018-07-11 16:03:48 +02:00
sfi
sh
sn
soc
spi spi: Fix scatterlist elements size in spi_map_buf 2018-07-03 11:21:35 +02:00
spmi spmi: Include OF based modalias in device uevent 2017-07-27 15:06:10 -07:00
ssb ssb: mark ssb_bus_register as __maybe_unused 2018-02-25 11:03:44 +01:00
staging staging: android: ion: Return an ERR_PTR in ion_map_kernel 2018-07-11 16:03:47 +02:00
target tcm_fileio: Prevent information leak for short reads 2018-03-24 10:58:45 +01:00
tc
thermal thermal: imx: Fix race condition in imx_thermal_probe() 2018-04-24 09:32:08 +02:00
thunderbolt thunderbolt: Resume control channel after hibernation image is created 2018-04-24 09:32:07 +02:00
tty n_tty: Access echo_* variables carefully. 2018-07-11 16:03:47 +02:00
uio
usb USB: serial: cp210x: add Silicon Labs IDs for Windows Update 2018-07-11 16:03:46 +02:00
uwb uwb: ensure that endpoint is interrupt 2017-10-12 11:27:35 +02:00
vfio vfio/pci: Virtualize Maximum Read Request Size 2018-04-24 09:32:09 +02:00
vhost vhost: correctly remove wait queue during poll failure 2018-04-13 19:50:25 +02:00
video video: uvesafb: Fix integer overflow in allocation 2018-07-03 11:21:34 +02:00
virt
virtio virtio_balloon: prevent uninitialized variable use 2018-02-25 11:03:42 +01:00
vlynq
vme
w1 1wire: family module autoload fails because of upper/lower case mismatch. 2018-07-03 11:21:27 +02:00
watchdog watchdog: f71808e_wdt: Fix magic close handling 2018-05-30 07:49:03 +02:00
xen xen: Remove unnecessary BUG_ON from __unbind_from_irq() 2018-07-03 11:21:34 +02:00
zorro zorro: Set up z->dev.dma_mask for the DMA API 2018-05-30 07:49:11 +02:00
Kconfig
Makefile usb: build drivers/usb/common/ when USB_SUPPORT is set 2018-02-25 11:03:38 +01:00