bcachefs: Runtime self healing for keys for deleted snapshots

If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.

Previously we'd just skip these keys, but it turns out that causes
copygc to spin.

So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.

Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This commit is contained in:
Kent Overstreet 2025-05-27 22:20:27 -04:00
parent f02d153274
commit 0224d17d76

View File

@ -821,13 +821,24 @@ int bch2_data_update_init(struct btree_trans *trans,
struct bch_fs *c = trans->c;
int ret = 0;
/*
* fs is corrupt we have a key for a snapshot node that doesn't exist,
* and we have to check for this because we go rw before repairing the
* snapshots table - just skip it, we can move it later.
*/
if (unlikely(k.k->p.snapshot && !bch2_snapshot_exists(c, k.k->p.snapshot)))
return -BCH_ERR_data_update_done_no_snapshot;
if (k.k->p.snapshot) {
/*
* We'll go ERO if we see a key for a missing snapshot, and if
* we're still in recovery we want to give that a chance to
* repair:
*/
if (unlikely(test_bit(BCH_FS_in_recovery, &c->flags) &&
bch2_snapshot_id_state(c, k.k->p.snapshot) == SNAPSHOT_ID_empty))
return -BCH_ERR_data_update_done_no_snapshot;
ret = bch2_check_key_has_snapshot(trans, iter, k);
if (ret < 0)
return ret;
if (ret) /* key was deleted */
return bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?:
-BCH_ERR_data_update_done_no_snapshot;
ret = 0;
}
bch2_bkey_buf_init(&m->k);
bch2_bkey_buf_reassemble(&m->k, c, k);