linux/drivers/md
Coly Li 59afd4f287 bcache: avoid journal no-space deadlock by reserving 1 journal bucket
commit 32feee36c3 upstream.

The journal no-space deadlock was reported time to time. Such deadlock
can happen in the following situation.

When all journal buckets are fully filled by active jset with heavy
write I/O load, the cache set registration (after a reboot) will load
all active jsets and inserting them into the btree again (which is
called journal replay). If a journaled bkey is inserted into a btree
node and results btree node split, new journal request might be
triggered. For example, the btree grows one more level after the node
split, then the root node record in cache device super block will be
upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
space in journal buckets, the journal replay has to wait for new journal
bucket to be reclaimed after at least one journal bucket replayed. This
is one example that how the journal no-space deadlock happens.

The solution to avoid the deadlock is to reserve 1 journal bucket in
run time, and only permit the reserved journal bucket to be used during
cache set registration procedure for things like journal replay. Then
the journal space will never be fully filled, there is no chance for
journal no-space deadlock to happen anymore.

This patch adds a new member "bool do_reserve" in struct journal, it is
inititalized to 0 (false) when struct journal is allocated, and set to
1 (true) by bch_journal_space_reserve() when all initialization done in
run_cache_set(). In the run time when journal_reclaim() tries to
allocate a new journal bucket, free_journal_buckets() is called to check
whether there are enough free journal buckets to use. If there is only
1 free journal bucket and journal->do_reserve is 1 (true), the last
bucket is reserved and free_journal_buckets() will return 0 to indicate
no free journal bucket. Then journal_reclaim() will give up, and try
next time to see whetheer there is free journal bucket to allocate. By
this method, there is always 1 jouranl bucket reserved in run time.

During the cache set registration, journal->do_reserve is 0 (false), so
the reserved journal bucket can be used to avoid the no-space deadlock.

Reported-by: Nikhil Kshirsagar <nkshirsagar@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09 10:21:28 +02:00
..
bcache bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-06-09 10:21:28 +02:00
persistent-data dm space map common: add bounds check to sm_ll_lookup_bitmap() 2022-01-27 10:54:20 +01:00
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size 2021-03-09 11:11:12 +01:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: Avoid returning cmd->bm wild pointer on error 2020-09-02 13:38:24 -04:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c Revert "dm cache: fix arm link errors with inline" 2020-12-01 15:43:36 -05:00
dm-clone-metadata.c
dm-clone-metadata.h
dm-clone-target.c writeback: remove bdi->congested_fn 2020-07-08 17:20:46 -06:00
dm-core.h dm: fix deadlock when swapping to encrypted device 2021-03-04 11:38:44 +01:00
dm-crypt.c dm crypt: make printing of the key constant-time 2022-06-06 08:42:44 +02:00
dm-delay.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
dm-dust.c dm dust: add interface to list all badblocks 2020-07-20 11:17:41 -04:00
dm-ebs-target.c dm ebs: Fix incorrect checking for REQ_OP_FLUSH 2020-08-04 16:01:40 -04:00
dm-era-target.c dm era: only resize metadata in preresume 2021-03-04 11:38:46 +01:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c
dm-historical-service-time.c dm mpath: only use ktime_get_ns() in historical selector 2022-04-20 09:23:18 +02:00
dm-init.c dm init: Set file local variable static 2020-08-04 15:51:28 -04:00
dm-integrity.c dm integrity: fix error code in dm_integrity_ctr() 2022-06-06 08:42:43 +02:00
dm-io.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
dm-ioctl.c dm ioctl: prevent potential spectre v1 gadget 2022-04-13 21:00:57 +02:00
dm-kcopyd.c
dm-linear.c dm: add support for REQ_NOWAIT and enable it for linear target 2020-09-25 08:20:03 -06:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c
dm-log.c
dm-mpath.c dm: use dm_table_get_device_name() where appropriate in targets 2020-09-29 16:33:08 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences 2021-05-11 14:47:36 +02:00
dm-raid1.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
dm-region-hash.c
dm-round-robin.c
dm-rq.c dm: requeue IO if mapping table not yet available 2022-04-13 21:00:57 +02:00
dm-rq.h
dm-service-time.c
dm-snap-persistent.c dm snap persistent: simplify area_io() 2020-09-29 16:33:12 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: properly fix a crash when an origin has no snapshots 2021-06-03 09:00:30 +02:00
dm-stats.c dm stats: add cond_resched when looping over entries 2022-06-06 08:42:44 +02:00
dm-stats.h
dm-stripe.c
dm-switch.c
dm-sysfs.c
dm-table.c dm table: Fix zoned model check and zone sectors check 2021-03-30 14:32:06 +02:00
dm-target.c
dm-thin-metadata.c dm thin metadata: Remove unused local variable when create thin and snap 2020-09-29 16:33:11 -04:00
dm-thin-metadata.h
dm-thin.c writeback: remove bdi->congested_fn 2020-07-08 17:20:46 -06:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c
dm-verity-fec.c dm verity fec: fix misaligned RS roots IO 2021-04-21 13:00:54 +02:00
dm-verity-fec.h dm verity fec: fix misaligned RS roots IO 2021-04-21 13:00:54 +02:00
dm-verity-target.c dm verity: set DM_TARGET_IMMUTABLE feature flag 2022-06-06 08:42:44 +02:00
dm-verity-verify-sig.c dm verity: fix require_signatures module_param permissions 2021-06-16 12:01:37 +02:00
dm-verity-verify-sig.h dm verity: Fix compilation warning 2020-08-04 15:48:13 -04:00
dm-verity.h dm verity: add "panic_on_corruption" error handling mode 2020-07-13 11:47:33 -04:00
dm-writecache.c dm writecache: write at least 4k when committing 2021-07-19 09:45:02 +02:00
dm-zero.c
dm-zoned-metadata.c dm zoned: check zone capacity 2021-07-19 09:45:01 +02:00
dm-zoned-reclaim.c dm zoned: Fix zone reclaim trigger 2020-07-08 12:21:53 -04:00
dm-zoned-target.c dm table: Fix zoned model check and zone sectors check 2021-03-30 14:32:06 +02:00
dm-zoned.h dm zoned: select reclaim zone based on device index 2020-06-05 14:59:53 -04:00
dm.c dm: interlock pending dm_io and dm_wait_for_bios_completion 2022-05-12 12:25:45 +02:00
dm.h dm table: fix DAX iterate_devices based device capability checks 2021-03-04 11:38:44 +01:00
Kconfig dm integrity: select CRYPTO_SKCIPHER 2021-01-27 11:54:57 +01:00
Makefile md: move the early init autodetect code to drivers/md/ 2020-07-16 15:34:47 +02:00
md-autodetect.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
md-bitmap.c md/bitmap: don't set sb values if can't pass sanity check 2022-06-09 10:20:52 +02:00
md-bitmap.h
md-cluster.c md/cluster: fix deadlock when node is doing resync job 2020-12-30 11:54:25 +01:00
md-cluster.h
md-faulty.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
md-linear.c block: add a new revalidate_disk_size helper 2020-09-02 08:00:07 -06:00
md-linear.h
md-multipath.c writeback: remove bdi->congested_fn 2020-07-08 17:20:46 -06:00
md-multipath.h
md.c md: fix an incorrect NULL check in md_reload_sb 2022-06-09 10:21:25 +02:00
md.h md: revert io stats accounting 2022-01-16 09:14:21 +01:00
raid1-10.c
raid1.c md/raid10: properly indicate failure when ending a failed write request 2021-08-12 13:22:17 +02:00
raid1.h
raid5-cache.c raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show 2020-08-02 23:03:52 -07:00
raid5-log.h
raid5-ppl.c md/raid456: convert macro STRIPE_* to RAID5_STRIPE_* 2020-07-21 17:18:12 -07:00
raid5.c raid5: introduce MD_BROKEN 2022-06-06 08:42:44 +02:00
raid5.h md/raid5: let multiple devices of stripe_head share page 2020-09-24 16:44:44 -07:00
raid10.c md/raid10: properly indicate failure when ending a failed write request 2021-08-12 13:22:17 +02:00
raid10.h Revert "md/raid10: improve discard request for far layout" 2020-12-09 20:46:00 -08:00
raid0.c Revert "md: add md_submit_discard_bio() for submitting discard bio" 2020-12-09 20:46:01 -08:00
raid0.h