bcachefs fixes for 6.16-rc2

As usual, highlighting the ones users have been noticing:
 
 - Fix a small issue with has_case_insensitive not being propagated on
   snapshot creation; this led to fsck errors, which we're harmless
   because we're not using this flag yet (it's for overlayfs +
   casefolding).
 
 - Log the error being corrected in the journal when we're doing fsck
   repair: this was one of the "lessons learned" from the i_nlink 0 ->
   subvolume deletion bug, where reconstructing what had happened by
   analyzing the journal was a bit more difficult than it needed to be.
 
 - Don't schedule btree node scan to run in the superblock: this fixes a
   regression from the 6.16 recovery passes rework, and let to it running
   unnecessarily.
 
   The real issue here is that we don't have online, "self healing" style
   topology repair yet: topology repair currently has to run before we go
   RW, which means that we may schedule it unnecessarily after a
   transient error. This will be fixed in the future.
 
 - We now track, in btree node flags, the reason it was scheduled to be
   rewritten. We discovered a deadlock in recovery when many btree nodes
   need to be rewritten because they're degraded: fully fixing this will
   take some work but it's now easier to see what's going on.
 
   For the bug report where this came up, a device had been kicked RO due
   to transient errors: manually setting it back to RW was sufficient to
   allow recovery to succeed.
 
 - Mark a few more fsck errors as autofix: as a reminder to users, please
   do keep reporting cases where something needs to be repaired and is
   not repaired automatically (i.e. cases where -o fix_errors or fsck -y
   is required).
 
 - rcu_pending.c now works with PREEMPT_RT
 
 - 'bcachefs device add', then umount, then remount wasn't working - we
   now emit a uevent so that the new device's new superblock is correctly
   picked up
 
 - Assorted repair fixes: btree node scan will no longer incorrectly
   update sb->version_min,
 
 - Assorted syzbot fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmhLiCcACgkQE6szbY3K
 bnZrAA/8D7N5iwNP7LA3APij7LUj3VAzqX1WG0S0LAWGxYOjrRMn8gdrPt9W5CAO
 aAQlddIijggwJAv7CQdxChaR4IuagNJVsjarJ8//Toa6mUs/5cj570Dl+voNyXmC
 /vgZBjN6hP917QVxbHGfa0TJUkmPbkbR2eILfTB95q8goDmyA9OFJOx3Je3G36x3
 vuCvucAXmovMkcb7F41XBp6wHWBSvsttkLsKb79jLCGroBPN1FZRHCPlZWZotyh8
 H7QCRkC1WbUP+jFd/xfjU2O8BrvmPt4PzqaXl8IS23KWVRtYjBFMIO15LTU6qnzV
 QvmbYZhCJJ822kkvCkOny8QY6I97U84kGFPqZXrGPPChf55ne3QTmicRKgJzxo5r
 35n940l5zPE5RL/c77RulkEqfArf2cZ4vzDRUymX7cZBk7QFTkdrjINAUNd0dUsx
 kMLG6Ej77i1eiQbKWnH+jxiGuslStdOknpIdNCc/mnJflU7MfTkY+W8iQwQHdH6f
 Yh4rS6S6xLxq4ZlJQBagUAdBdeCNnybg0Voxim+uoy/nF+nnRupJJfvBWoJdSxxT
 PVhfT8VNUMOEiJannPRhqTzgVHBKVIMS1xxw3L4SeLqIW/JZcEeNWlNVmgPdiWu+
 X7LJJ5GcuxXnXOo+dxJtKTG5bf/PvtfPaD7gv+PK7QuXybrreL0=
 =hPjd
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "As usual, highlighting the ones users have been noticing:

   - Fix a small issue with has_case_insensitive not being propagated on
     snapshot creation; this led to fsck errors, which we're harmless
     because we're not using this flag yet (it's for overlayfs +
     casefolding).

   - Log the error being corrected in the journal when we're doing fsck
     repair: this was one of the "lessons learned" from the i_nlink 0 ->
     subvolume deletion bug, where reconstructing what had happened by
     analyzing the journal was a bit more difficult than it needed to
     be.

   - Don't schedule btree node scan to run in the superblock: this fixes
     a regression from the 6.16 recovery passes rework, and let to it
     running unnecessarily.

     The real issue here is that we don't have online, "self healing"
     style topology repair yet: topology repair currently has to run
     before we go RW, which means that we may schedule it unnecessarily
     after a transient error. This will be fixed in the future.

   - We now track, in btree node flags, the reason it was scheduled to
     be rewritten. We discovered a deadlock in recovery when many btree
     nodes need to be rewritten because they're degraded: fully fixing
     this will take some work but it's now easier to see what's going
     on.

     For the bug report where this came up, a device had been kicked RO
     due to transient errors: manually setting it back to RW was
     sufficient to allow recovery to succeed.

   - Mark a few more fsck errors as autofix: as a reminder to users,
     please do keep reporting cases where something needs to be repaired
     and is not repaired automatically (i.e. cases where -o fix_errors
     or fsck -y is required).

   - rcu_pending.c now works with PREEMPT_RT

   - 'bcachefs device add', then umount, then remount wasn't working -
     we now emit a uevent so that the new device's new superblock is
     correctly picked up

   - Assorted repair fixes: btree node scan will no longer incorrectly
     update sb->version_min,

   - Assorted syzbot fixes"

* tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs: (23 commits)
  bcachefs: Don't trace should_be_locked unless changing
  bcachefs: Ensure that snapshot creation propagates has_case_insensitive
  bcachefs: Print devices we're mounting on multi device filesystems
  bcachefs: Don't trust sb->nr_devices in members_to_text()
  bcachefs: Fix version checks in validate_bset()
  bcachefs: ioctl: avoid stack overflow warning
  bcachefs: Don't pass trans to fsck_err() in gc_accounting_done
  bcachefs: Fix leak in bch2_fs_recovery() error path
  bcachefs: Fix rcu_pending for PREEMPT_RT
  bcachefs: Fix downgrade_table_extra()
  bcachefs: Don't put rhashtable on stack
  bcachefs: Make sure opts.read_only gets propagated back to VFS
  bcachefs: Fix possible console lock involved deadlock
  bcachefs: mark more errors autofix
  bcachefs: Don't persistently run scan_for_btree_nodes
  bcachefs: Read error message now prints if self healing
  bcachefs: Only run 'increase_depth' for keys from btree node csan
  bcachefs: Mark need_discard_freespace_key_bad autofix
  bcachefs: Update /dev/disk/by-uuid on device add
  bcachefs: Add more flags to btree nodes for rewrite reason
  ...
This commit is contained in:
Linus Torvalds 2025-06-13 09:49:07 -07:00
commit 36df6f734a
25 changed files with 319 additions and 116 deletions

View File

@ -296,7 +296,6 @@ do { \
#define bch2_fmt(_c, fmt) bch2_log_msg(_c, fmt "\n")
void bch2_print_str(struct bch_fs *, const char *, const char *);
void bch2_print_str_nonblocking(struct bch_fs *, const char *, const char *);
__printf(2, 3)
void bch2_print_opts(struct bch_opts *, const char *, ...);

View File

@ -397,7 +397,11 @@ static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct
continue;
}
ret = btree_check_node_boundaries(trans, b, prev, cur, pulled_from_scan);
ret = lockrestart_do(trans,
btree_check_node_boundaries(trans, b, prev, cur, pulled_from_scan));
if (ret < 0)
goto err;
if (ret == DID_FILL_FROM_SCAN) {
new_pass = true;
ret = 0;
@ -438,7 +442,8 @@ static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct
if (!ret && !IS_ERR_OR_NULL(prev)) {
BUG_ON(cur);
ret = btree_repair_node_end(trans, b, prev, pulled_from_scan);
ret = lockrestart_do(trans,
btree_repair_node_end(trans, b, prev, pulled_from_scan));
if (ret == DID_FILL_FROM_SCAN) {
new_pass = true;
ret = 0;
@ -519,6 +524,46 @@ static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct
bch2_bkey_buf_exit(&prev_k, c);
bch2_bkey_buf_exit(&cur_k, c);
printbuf_exit(&buf);
bch_err_fn(c, ret);
return ret;
}
static int bch2_check_root(struct btree_trans *trans, enum btree_id i,
bool *reconstructed_root)
{
struct bch_fs *c = trans->c;
struct btree_root *r = bch2_btree_id_root(c, i);
struct printbuf buf = PRINTBUF;
int ret = 0;
bch2_btree_id_to_text(&buf, i);
if (r->error) {
bch_info(c, "btree root %s unreadable, must recover from scan", buf.buf);
r->alive = false;
r->error = 0;
if (!bch2_btree_has_scanned_nodes(c, i)) {
__fsck_err(trans,
FSCK_CAN_FIX|(!btree_id_important(i) ? FSCK_AUTOFIX : 0),
btree_root_unreadable_and_scan_found_nothing,
"no nodes found for btree %s, continue?", buf.buf);
bch2_btree_root_alloc_fake_trans(trans, i, 0);
} else {
bch2_btree_root_alloc_fake_trans(trans, i, 1);
bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX);
ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX);
if (ret)
goto err;
}
*reconstructed_root = true;
}
err:
fsck_err:
printbuf_exit(&buf);
bch_err_fn(c, ret);
return ret;
}
@ -526,42 +571,18 @@ int bch2_check_topology(struct bch_fs *c)
{
struct btree_trans *trans = bch2_trans_get(c);
struct bpos pulled_from_scan = POS_MIN;
struct printbuf buf = PRINTBUF;
int ret = 0;
bch2_trans_srcu_unlock(trans);
for (unsigned i = 0; i < btree_id_nr_alive(c) && !ret; i++) {
struct btree_root *r = bch2_btree_id_root(c, i);
bool reconstructed_root = false;
recover:
ret = lockrestart_do(trans, bch2_check_root(trans, i, &reconstructed_root));
if (ret)
break;
printbuf_reset(&buf);
bch2_btree_id_to_text(&buf, i);
if (r->error) {
reconstruct_root:
bch_info(c, "btree root %s unreadable, must recover from scan", buf.buf);
r->alive = false;
r->error = 0;
if (!bch2_btree_has_scanned_nodes(c, i)) {
__fsck_err(trans,
FSCK_CAN_FIX|(!btree_id_important(i) ? FSCK_AUTOFIX : 0),
btree_root_unreadable_and_scan_found_nothing,
"no nodes found for btree %s, continue?", buf.buf);
bch2_btree_root_alloc_fake_trans(trans, i, 0);
} else {
bch2_btree_root_alloc_fake_trans(trans, i, 1);
bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX);
ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX);
if (ret)
break;
}
reconstructed_root = true;
}
struct btree_root *r = bch2_btree_id_root(c, i);
struct btree *b = r->b;
btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_read);
@ -575,17 +596,21 @@ int bch2_check_topology(struct bch_fs *c)
r->b = NULL;
if (!reconstructed_root)
goto reconstruct_root;
if (!reconstructed_root) {
r->error = -EIO;
goto recover;
}
struct printbuf buf = PRINTBUF;
bch2_btree_id_to_text(&buf, i);
bch_err(c, "empty btree root %s", buf.buf);
printbuf_exit(&buf);
bch2_btree_root_alloc_fake_trans(trans, i, 0);
r->alive = false;
ret = 0;
}
}
fsck_err:
printbuf_exit(&buf);
bch2_trans_put(trans);
return ret;
}

View File

@ -741,16 +741,22 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca,
BCH_VERSION_MAJOR(version),
BCH_VERSION_MINOR(version));
if (btree_err_on(version < c->sb.version_min,
if (c->recovery.curr_pass != BCH_RECOVERY_PASS_scan_for_btree_nodes &&
btree_err_on(version < c->sb.version_min,
-BCH_ERR_btree_node_read_err_fixable,
c, NULL, b, i, NULL,
btree_node_bset_older_than_sb_min,
"bset version %u older than superblock version_min %u",
version, c->sb.version_min)) {
mutex_lock(&c->sb_lock);
c->disk_sb.sb->version_min = cpu_to_le16(version);
bch2_write_super(c);
mutex_unlock(&c->sb_lock);
if (bch2_version_compatible(version)) {
mutex_lock(&c->sb_lock);
c->disk_sb.sb->version_min = cpu_to_le16(version);
bch2_write_super(c);
mutex_unlock(&c->sb_lock);
} else {
/* We have no idea what's going on: */
i->version = cpu_to_le16(c->sb.version);
}
}
if (btree_err_on(BCH_VERSION_MAJOR(version) >
@ -1045,6 +1051,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
le16_add_cpu(&i->u64s, -next_good_key);
memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
set_btree_node_need_rewrite(b);
set_btree_node_need_rewrite_error(b);
}
fsck_err:
printbuf_exit(&buf);
@ -1305,6 +1312,7 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
(u64 *) vstruct_end(i) - (u64 *) k);
set_btree_bset_end(b, b->set);
set_btree_node_need_rewrite(b);
set_btree_node_need_rewrite_error(b);
continue;
}
if (ret)
@ -1329,12 +1337,16 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
bkey_for_each_ptr(bch2_bkey_ptrs(bkey_i_to_s(&b->key)), ptr) {
struct bch_dev *ca2 = bch2_dev_rcu(c, ptr->dev);
if (!ca2 || ca2->mi.state != BCH_MEMBER_STATE_rw)
if (!ca2 || ca2->mi.state != BCH_MEMBER_STATE_rw) {
set_btree_node_need_rewrite(b);
set_btree_node_need_rewrite_degraded(b);
}
}
if (!ptr_written)
if (!ptr_written) {
set_btree_node_need_rewrite(b);
set_btree_node_need_rewrite_ptr_written_zero(b);
}
fsck_err:
mempool_free(iter, &c->fill_iter);
printbuf_exit(&buf);

View File

@ -213,7 +213,7 @@ static noinline __noreturn void break_cycle_fail(struct lock_graph *g)
prt_newline(&buf);
}
bch2_print_str_nonblocking(g->g->trans->c, KERN_ERR, buf.buf);
bch2_print_str(g->g->trans->c, KERN_ERR, buf.buf);
printbuf_exit(&buf);
BUG();
}

View File

@ -417,8 +417,10 @@ static inline void btree_path_set_should_be_locked(struct btree_trans *trans, st
EBUG_ON(!btree_node_locked(path, path->level));
EBUG_ON(path->uptodate);
path->should_be_locked = true;
trace_btree_path_should_be_locked(trans, path);
if (!path->should_be_locked) {
path->should_be_locked = true;
trace_btree_path_should_be_locked(trans, path);
}
}
static inline void __btree_path_set_level_up(struct btree_trans *trans,

View File

@ -617,6 +617,9 @@ enum btree_write_type {
x(dying) \
x(fake) \
x(need_rewrite) \
x(need_rewrite_error) \
x(need_rewrite_degraded) \
x(need_rewrite_ptr_written_zero) \
x(never_write) \
x(pinned)
@ -641,6 +644,32 @@ static inline void clear_btree_node_ ## flag(struct btree *b) \
BTREE_FLAGS()
#undef x
#define BTREE_NODE_REWRITE_REASON() \
x(none) \
x(unknown) \
x(error) \
x(degraded) \
x(ptr_written_zero)
enum btree_node_rewrite_reason {
#define x(n) BTREE_NODE_REWRITE_##n,
BTREE_NODE_REWRITE_REASON()
#undef x
};
static inline enum btree_node_rewrite_reason btree_node_rewrite_reason(struct btree *b)
{
if (btree_node_need_rewrite_ptr_written_zero(b))
return BTREE_NODE_REWRITE_ptr_written_zero;
if (btree_node_need_rewrite_degraded(b))
return BTREE_NODE_REWRITE_degraded;
if (btree_node_need_rewrite_error(b))
return BTREE_NODE_REWRITE_error;
if (btree_node_need_rewrite(b))
return BTREE_NODE_REWRITE_unknown;
return BTREE_NODE_REWRITE_none;
}
static inline struct btree_write *btree_current_write(struct btree *b)
{
return b->writes + btree_node_write_idx(b);

View File

@ -1138,6 +1138,13 @@ static void bch2_btree_update_done(struct btree_update *as, struct btree_trans *
start_time);
}
static const char * const btree_node_reawrite_reason_strs[] = {
#define x(n) #n,
BTREE_NODE_REWRITE_REASON()
#undef x
NULL,
};
static struct btree_update *
bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
unsigned level_start, bool split,
@ -1232,6 +1239,15 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
list_add_tail(&as->list, &c->btree_interior_update_list);
mutex_unlock(&c->btree_interior_update_lock);
struct btree *b = btree_path_node(path, path->level);
as->node_start = b->data->min_key;
as->node_end = b->data->max_key;
as->node_needed_rewrite = btree_node_rewrite_reason(b);
as->node_written = b->written;
as->node_sectors = btree_buf_bytes(b) >> 9;
as->node_remaining = __bch2_btree_u64s_remaining(b,
btree_bkey_last(b, bset_tree_last(b)));
/*
* We don't want to allocate if we're in an error state, that can cause
* deadlock on emergency shutdown due to open buckets getting stuck in
@ -2108,6 +2124,9 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,
if (ret)
goto err;
as->node_start = prev->data->min_key;
as->node_end = next->data->max_key;
trace_and_count(c, btree_node_merge, trans, b);
n = bch2_btree_node_alloc(as, trans, b->c.level);
@ -2681,9 +2700,19 @@ static void bch2_btree_update_to_text(struct printbuf *out, struct btree_update
prt_str(out, " ");
bch2_btree_id_to_text(out, as->btree_id);
prt_printf(out, " l=%u-%u mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n",
prt_printf(out, " l=%u-%u ",
as->update_level_start,
as->update_level_end,
as->update_level_end);
bch2_bpos_to_text(out, as->node_start);
prt_char(out, ' ');
bch2_bpos_to_text(out, as->node_end);
prt_printf(out, "\nwritten %u/%u u64s_remaining %u need_rewrite %s",
as->node_written,
as->node_sectors,
as->node_remaining,
btree_node_reawrite_reason_strs[as->node_needed_rewrite]);
prt_printf(out, "\nmode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n",
bch2_btree_update_modes[as->mode],
as->nodes_written,
closure_nr_remaining(&as->cl),

View File

@ -57,6 +57,13 @@ struct btree_update {
unsigned took_gc_lock:1;
enum btree_id btree_id;
struct bpos node_start;
struct bpos node_end;
enum btree_node_rewrite_reason node_needed_rewrite;
u16 node_written;
u16 node_sectors;
u16 node_remaining;
unsigned update_level_start;
unsigned update_level_end;

View File

@ -399,7 +399,7 @@ static long bch2_ioctl_data(struct bch_fs *c,
return ret;
}
static long bch2_ioctl_fs_usage(struct bch_fs *c,
static noinline_for_stack long bch2_ioctl_fs_usage(struct bch_fs *c,
struct bch_ioctl_fs_usage __user *user_arg)
{
struct bch_ioctl_fs_usage arg = {};
@ -469,7 +469,7 @@ static long bch2_ioctl_query_accounting(struct bch_fs *c,
}
/* obsolete, didn't allow for new data types: */
static long bch2_ioctl_dev_usage(struct bch_fs *c,
static noinline_for_stack long bch2_ioctl_dev_usage(struct bch_fs *c,
struct bch_ioctl_dev_usage __user *user_arg)
{
struct bch_ioctl_dev_usage arg;

View File

@ -618,7 +618,9 @@ int bch2_gc_accounting_done(struct bch_fs *c)
for (unsigned j = 0; j < nr; j++)
src_v[j] -= dst_v[j];
if (fsck_err(trans, accounting_mismatch, "%s", buf.buf)) {
bch2_trans_unlock_long(trans);
if (fsck_err(c, accounting_mismatch, "%s", buf.buf)) {
percpu_up_write(&c->mark_lock);
ret = commit_do(trans, NULL, NULL, 0,
bch2_disk_accounting_mod(trans, &acc_k, src_v, nr, false));

View File

@ -69,7 +69,7 @@ static bool bch2_fs_trans_inconsistent(struct bch_fs *c, struct btree_trans *tra
if (trans)
bch2_trans_updates_to_text(&buf, trans);
bool ret = __bch2_inconsistent_error(c, &buf);
bch2_print_str_nonblocking(c, KERN_ERR, buf.buf);
bch2_print_str(c, KERN_ERR, buf.buf);
printbuf_exit(&buf);
return ret;
@ -620,6 +620,9 @@ int __bch2_fsck_err(struct bch_fs *c,
if (s)
s->ret = ret;
if (trans)
ret = bch2_trans_log_str(trans, bch2_sb_error_strs[err]) ?: ret;
err_unlock:
mutex_unlock(&c->fsck_error_msgs_lock);
err:

View File

@ -2490,6 +2490,14 @@ static int bch2_fs_get_tree(struct fs_context *fc)
if (ret)
goto err_stop_fs;
/*
* We might be doing a RO mount because other options required it, or we
* have no alloc info and it's a small image with no room to regenerate
* it
*/
if (c->opts.read_only)
fc->sb_flags |= SB_RDONLY;
sb = sget(fc->fs_type, NULL, bch2_set_super, fc->sb_flags|SB_NOSEC, c);
ret = PTR_ERR_OR_ZERO(sb);
if (ret)

View File

@ -343,6 +343,10 @@ static struct bch_read_bio *promote_alloc(struct btree_trans *trans,
*bounce = true;
*read_full = promote_full;
if (have_io_error(failed))
orig->self_healing = true;
return promote;
nopromote:
trace_io_read_nopromote(c, ret);
@ -635,12 +639,15 @@ static void bch2_rbio_retry(struct work_struct *work)
prt_str(&buf, "(internal move) ");
prt_str(&buf, "data read error, ");
if (!ret)
if (!ret) {
prt_str(&buf, "successful retry");
else
if (rbio->self_healing)
prt_str(&buf, ", self healing");
} else
prt_str(&buf, bch2_err_str(ret));
prt_newline(&buf);
if (!bkey_deleted(&sk.k->k)) {
bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(sk.k));
prt_newline(&buf);

View File

@ -44,6 +44,7 @@ struct bch_read_bio {
have_ioref:1,
narrow_crcs:1,
saw_error:1,
self_healing:1,
context:2;
};
u16 _state;

View File

@ -28,7 +28,7 @@
#include <linux/wait.h>
struct buckets_in_flight {
struct rhashtable table;
struct rhashtable *table;
struct move_bucket *first;
struct move_bucket *last;
size_t nr;
@ -98,7 +98,7 @@ static int bch2_bucket_is_movable(struct btree_trans *trans,
static void move_bucket_free(struct buckets_in_flight *list,
struct move_bucket *b)
{
int ret = rhashtable_remove_fast(&list->table, &b->hash,
int ret = rhashtable_remove_fast(list->table, &b->hash,
bch_move_bucket_params);
BUG_ON(ret);
kfree(b);
@ -133,7 +133,7 @@ static void move_buckets_wait(struct moving_context *ctxt,
static bool bucket_in_flight(struct buckets_in_flight *list,
struct move_bucket_key k)
{
return rhashtable_lookup_fast(&list->table, &k, bch_move_bucket_params);
return rhashtable_lookup_fast(list->table, &k, bch_move_bucket_params);
}
static int bch2_copygc_get_buckets(struct moving_context *ctxt,
@ -185,7 +185,7 @@ static int bch2_copygc_get_buckets(struct moving_context *ctxt,
goto err;
}
ret2 = rhashtable_lookup_insert_fast(&buckets_in_flight->table, &b_i->hash,
ret2 = rhashtable_lookup_insert_fast(buckets_in_flight->table, &b_i->hash,
bch_move_bucket_params);
BUG_ON(ret2);
@ -350,10 +350,13 @@ static int bch2_copygc_thread(void *arg)
struct buckets_in_flight buckets = {};
u64 last, wait;
int ret = rhashtable_init(&buckets.table, &bch_move_bucket_params);
buckets.table = kzalloc(sizeof(*buckets.table), GFP_KERNEL);
int ret = !buckets.table
? -ENOMEM
: rhashtable_init(buckets.table, &bch_move_bucket_params);
bch_err_msg(c, ret, "allocating copygc buckets in flight");
if (ret)
return ret;
goto err;
set_freezable();
@ -421,11 +424,12 @@ static int bch2_copygc_thread(void *arg)
}
move_buckets_wait(&ctxt, &buckets, true);
rhashtable_destroy(&buckets.table);
rhashtable_destroy(buckets.table);
bch2_moving_ctxt_exit(&ctxt);
bch2_move_stats_exit(&move_stats, c);
return 0;
err:
kfree(buckets.table);
return ret;
}
void bch2_copygc_stop(struct bch_fs *c)

View File

@ -175,6 +175,16 @@ int bch2_create_trans(struct btree_trans *trans,
new_inode->bi_dir_offset = dir_offset;
}
if (S_ISDIR(mode)) {
ret = bch2_maybe_propagate_has_case_insensitive(trans,
(subvol_inum) {
new_inode->bi_subvol ?: dir.subvol,
new_inode->bi_inum },
new_inode);
if (ret)
goto err;
}
if (S_ISDIR(mode) &&
!new_inode->bi_subvol)
new_inode->bi_depth = dir_u->bi_depth + 1;

View File

@ -182,11 +182,6 @@ static inline void kfree_bulk(size_t nr, void ** p)
while (nr--)
kfree(*p);
}
#define local_irq_save(flags) \
do { \
flags = 0; \
} while (0)
#endif
static noinline void __process_finished_items(struct rcu_pending *pending,
@ -429,9 +424,15 @@ __rcu_pending_enqueue(struct rcu_pending *pending, struct rcu_head *head,
BUG_ON((ptr != NULL) != (pending->process == RCU_PENDING_KVFREE_FN));
local_irq_save(flags);
p = this_cpu_ptr(pending->p);
spin_lock(&p->lock);
/* We could technically be scheduled before taking the lock and end up
* using a different cpu's rcu_pending_pcpu: that's ok, it needs a lock
* anyways
*
* And we have to do it this way to avoid breaking PREEMPT_RT, which
* redefines how spinlocks work:
*/
p = raw_cpu_ptr(pending->p);
spin_lock_irqsave(&p->lock, flags);
rcu_gp_poll_state_t seq = __get_state_synchronize_rcu(pending->srcu);
restart:
if (may_sleep &&
@ -520,9 +521,8 @@ __rcu_pending_enqueue(struct rcu_pending *pending, struct rcu_head *head,
goto free_node;
}
local_irq_save(flags);
p = this_cpu_ptr(pending->p);
spin_lock(&p->lock);
p = raw_cpu_ptr(pending->p);
spin_lock_irqsave(&p->lock, flags);
goto restart;
}

View File

@ -99,9 +99,11 @@ int bch2_btree_lost_data(struct bch_fs *c,
goto out;
case BTREE_ID_snapshots:
ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_reconstruct_snapshots, 0) ?: ret;
ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_topology, 0) ?: ret;
ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes, 0) ?: ret;
goto out;
default:
ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_topology, 0) ?: ret;
ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes, 0) ?: ret;
goto out;
}
@ -271,13 +273,24 @@ static int bch2_journal_replay_key(struct btree_trans *trans,
goto out;
struct btree_path *path = btree_iter_path(trans, &iter);
if (unlikely(!btree_path_node(path, k->level))) {
if (unlikely(!btree_path_node(path, k->level) &&
!k->allocated)) {
struct bch_fs *c = trans->c;
if (!(c->recovery.passes_complete & (BIT_ULL(BCH_RECOVERY_PASS_scan_for_btree_nodes)|
BIT_ULL(BCH_RECOVERY_PASS_check_topology)))) {
bch_err(c, "have key in journal replay for btree depth that does not exist, confused");
ret = -EINVAL;
}
#if 0
bch2_trans_iter_exit(trans, &iter);
bch2_trans_node_iter_init(trans, &iter, k->btree_id, k->k->k.p,
BTREE_MAX_DEPTH, 0, iter_flags);
ret = bch2_btree_iter_traverse(trans, &iter) ?:
bch2_btree_increase_depth(trans, iter.path, 0) ?:
-BCH_ERR_transaction_restart_nested;
#endif
k->overwritten = true;
goto out;
}
@ -739,9 +752,11 @@ int bch2_fs_recovery(struct bch_fs *c)
? min(c->opts.recovery_pass_last, BCH_RECOVERY_PASS_snapshots_read)
: BCH_RECOVERY_PASS_snapshots_read;
c->opts.nochanges = true;
c->opts.read_only = true;
}
if (c->opts.nochanges)
c->opts.read_only = true;
mutex_lock(&c->sb_lock);
struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
bool write_sb = false;
@ -1093,9 +1108,6 @@ int bch2_fs_recovery(struct bch_fs *c)
out:
bch2_flush_fsck_errs(c);
if (!IS_ERR(clean))
kfree(clean);
if (!ret &&
test_bit(BCH_FS_need_delete_dead_snapshots, &c->flags) &&
!c->opts.nochanges) {
@ -1104,6 +1116,9 @@ int bch2_fs_recovery(struct bch_fs *c)
}
bch_err_fn(c, ret);
final_out:
if (!IS_ERR(clean))
kfree(clean);
return ret;
err:
fsck_err:
@ -1117,7 +1132,7 @@ int bch2_fs_recovery(struct bch_fs *c)
bch2_print_str(c, KERN_ERR, buf.buf);
printbuf_exit(&buf);
}
return ret;
goto final_out;
}
int bch2_fs_initialize(struct bch_fs *c)

View File

@ -294,8 +294,13 @@ static bool recovery_pass_needs_set(struct bch_fs *c,
enum bch_run_recovery_pass_flags *flags)
{
struct bch_fs_recovery *r = &c->recovery;
bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
bool persistent = !in_recovery || !(*flags & RUN_RECOVERY_PASS_nopersistent);
/*
* Never run scan_for_btree_nodes persistently: check_topology will run
* it if required
*/
if (pass == BCH_RECOVERY_PASS_scan_for_btree_nodes)
*flags |= RUN_RECOVERY_PASS_nopersistent;
if ((*flags & RUN_RECOVERY_PASS_ratelimit) &&
!bch2_recovery_pass_want_ratelimit(c, pass))
@ -310,6 +315,8 @@ static bool recovery_pass_needs_set(struct bch_fs *c,
* Otherwise, we run run_explicit_recovery_pass when we find damage, so
* it should run again even if it's already run:
*/
bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
bool persistent = !in_recovery || !(*flags & RUN_RECOVERY_PASS_nopersistent);
if (persistent
? !(c->sb.recovery_passes_required & BIT_ULL(pass))
@ -334,6 +341,7 @@ int __bch2_run_explicit_recovery_pass(struct bch_fs *c,
struct bch_fs_recovery *r = &c->recovery;
int ret = 0;
lockdep_assert_held(&c->sb_lock);
bch2_printbuf_make_room(out, 1024);
@ -446,7 +454,7 @@ int bch2_require_recovery_pass(struct bch_fs *c,
int bch2_run_print_explicit_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
{
enum bch_run_recovery_pass_flags flags = RUN_RECOVERY_PASS_nopersistent;
enum bch_run_recovery_pass_flags flags = 0;
if (!recovery_pass_needs_set(c, pass, &flags))
return 0;

View File

@ -253,6 +253,7 @@ DOWNGRADE_TABLE()
static int downgrade_table_extra(struct bch_fs *c, darray_char *table)
{
unsigned dst_offset = table->nr;
struct bch_sb_field_downgrade_entry *dst = (void *) &darray_top(*table);
unsigned bytes = sizeof(*dst) + sizeof(dst->errors[0]) * le16_to_cpu(dst->nr_errors);
int ret = 0;
@ -268,6 +269,9 @@ static int downgrade_table_extra(struct bch_fs *c, darray_char *table)
if (ret)
return ret;
dst = (void *) &table->data[dst_offset];
dst->nr_errors = cpu_to_le16(nr_errors + 1);
/* open coded __set_bit_le64, as dst is packed and
* dst->recovery_passes is misaligned */
unsigned b = BCH_RECOVERY_PASS_STABLE_check_allocations;
@ -278,7 +282,6 @@ static int downgrade_table_extra(struct bch_fs *c, darray_char *table)
break;
}
dst->nr_errors = cpu_to_le16(nr_errors);
return ret;
}

View File

@ -134,7 +134,7 @@ enum bch_fsck_flags {
x(bucket_gens_to_invalid_buckets, 121, FSCK_AUTOFIX) \
x(bucket_gens_nonzero_for_invalid_buckets, 122, FSCK_AUTOFIX) \
x(need_discard_freespace_key_to_invalid_dev_bucket, 123, 0) \
x(need_discard_freespace_key_bad, 124, 0) \
x(need_discard_freespace_key_bad, 124, FSCK_AUTOFIX) \
x(discarding_bucket_not_in_need_discard_btree, 291, 0) \
x(backpointer_bucket_offset_wrong, 125, 0) \
x(backpointer_level_bad, 294, 0) \
@ -165,7 +165,7 @@ enum bch_fsck_flags {
x(ptr_to_missing_replicas_entry, 149, FSCK_AUTOFIX) \
x(ptr_to_missing_stripe, 150, 0) \
x(ptr_to_incorrect_stripe, 151, 0) \
x(ptr_gen_newer_than_bucket_gen, 152, 0) \
x(ptr_gen_newer_than_bucket_gen, 152, FSCK_AUTOFIX) \
x(ptr_too_stale, 153, 0) \
x(stale_dirty_ptr, 154, FSCK_AUTOFIX) \
x(ptr_bucket_data_type_mismatch, 155, 0) \
@ -236,7 +236,7 @@ enum bch_fsck_flags {
x(inode_multiple_links_but_nlink_0, 207, FSCK_AUTOFIX) \
x(inode_wrong_backpointer, 208, FSCK_AUTOFIX) \
x(inode_wrong_nlink, 209, FSCK_AUTOFIX) \
x(inode_has_child_snapshots_wrong, 287, 0) \
x(inode_has_child_snapshots_wrong, 287, FSCK_AUTOFIX) \
x(inode_unreachable, 210, FSCK_AUTOFIX) \
x(inode_journal_seq_in_future, 299, FSCK_AUTOFIX) \
x(inode_i_sectors_underflow, 312, FSCK_AUTOFIX) \
@ -279,8 +279,8 @@ enum bch_fsck_flags {
x(root_dir_missing, 239, 0) \
x(root_inode_not_dir, 240, 0) \
x(dir_loop, 241, 0) \
x(hash_table_key_duplicate, 242, 0) \
x(hash_table_key_wrong_offset, 243, 0) \
x(hash_table_key_duplicate, 242, FSCK_AUTOFIX) \
x(hash_table_key_wrong_offset, 243, FSCK_AUTOFIX) \
x(unlinked_inode_not_on_deleted_list, 244, FSCK_AUTOFIX) \
x(reflink_p_front_pad_bad, 245, 0) \
x(journal_entry_dup_same_device, 246, 0) \

View File

@ -325,9 +325,17 @@ static void bch2_sb_members_v1_to_text(struct printbuf *out, struct bch_sb *sb,
{
struct bch_sb_field_members_v1 *mi = field_to_type(f, members_v1);
struct bch_sb_field_disk_groups *gi = bch2_sb_field_get(sb, disk_groups);
unsigned i;
for (i = 0; i < sb->nr_devices; i++)
if (vstruct_end(&mi->field) <= (void *) &mi->_members[0]) {
prt_printf(out, "field ends before start of entries");
return;
}
unsigned nr = (vstruct_end(&mi->field) - (void *) &mi->_members[0]) / sizeof(mi->_members[0]);
if (nr != sb->nr_devices)
prt_printf(out, "nr_devices mismatch: have %i entries, should be %u", nr, sb->nr_devices);
for (unsigned i = 0; i < min(sb->nr_devices, nr); i++)
member_to_text(out, members_v1_get(mi, i), gi, sb, i);
}
@ -341,9 +349,27 @@ static void bch2_sb_members_v2_to_text(struct printbuf *out, struct bch_sb *sb,
{
struct bch_sb_field_members_v2 *mi = field_to_type(f, members_v2);
struct bch_sb_field_disk_groups *gi = bch2_sb_field_get(sb, disk_groups);
unsigned i;
for (i = 0; i < sb->nr_devices; i++)
if (vstruct_end(&mi->field) <= (void *) &mi->_members[0]) {
prt_printf(out, "field ends before start of entries");
return;
}
if (!le16_to_cpu(mi->member_bytes)) {
prt_printf(out, "member_bytes 0");
return;
}
unsigned nr = (vstruct_end(&mi->field) - (void *) &mi->_members[0]) / le16_to_cpu(mi->member_bytes);
if (nr != sb->nr_devices)
prt_printf(out, "nr_devices mismatch: have %i entries, should be %u", nr, sb->nr_devices);
/*
* We call to_text() on superblock sections that haven't passed
* validate, so we can't trust sb->nr_devices.
*/
for (unsigned i = 0; i < min(sb->nr_devices, nr); i++)
member_to_text(out, members_v2_get(mi, i), gi, sb, i);
}

View File

@ -104,7 +104,7 @@ const char * const bch2_dev_write_refs[] = {
#undef x
static void __bch2_print_str(struct bch_fs *c, const char *prefix,
const char *str, bool nonblocking)
const char *str)
{
#ifdef __KERNEL__
struct stdio_redirect *stdio = bch2_fs_stdio_redirect(c);
@ -114,17 +114,12 @@ static void __bch2_print_str(struct bch_fs *c, const char *prefix,
return;
}
#endif
bch2_print_string_as_lines(KERN_ERR, str, nonblocking);
bch2_print_string_as_lines(KERN_ERR, str);
}
void bch2_print_str(struct bch_fs *c, const char *prefix, const char *str)
{
__bch2_print_str(c, prefix, str, false);
}
void bch2_print_str_nonblocking(struct bch_fs *c, const char *prefix, const char *str)
{
__bch2_print_str(c, prefix, str, true);
__bch2_print_str(c, prefix, str);
}
__printf(2, 0)
@ -1072,12 +1067,13 @@ noinline_for_stack
static void print_mount_opts(struct bch_fs *c)
{
enum bch_opt_id i;
struct printbuf p = PRINTBUF;
bool first = true;
CLASS(printbuf, p)();
bch2_log_msg_start(c, &p);
prt_str(&p, "starting version ");
bch2_version_to_text(&p, c->sb.version);
bool first = true;
for (i = 0; i < bch2_opts_nr; i++) {
const struct bch_option *opt = &bch2_opt_table[i];
u64 v = bch2_opt_get_by_id(&c->opts, i);
@ -1094,17 +1090,24 @@ static void print_mount_opts(struct bch_fs *c)
}
if (c->sb.version_incompat_allowed != c->sb.version) {
prt_printf(&p, "\n allowing incompatible features above ");
prt_printf(&p, "\nallowing incompatible features above ");
bch2_version_to_text(&p, c->sb.version_incompat_allowed);
}
if (c->opts.verbose) {
prt_printf(&p, "\n features: ");
prt_printf(&p, "\nfeatures: ");
prt_bitflags(&p, bch2_sb_features, c->sb.features);
}
bch_info(c, "%s", p.buf);
printbuf_exit(&p);
if (c->sb.multi_device) {
prt_printf(&p, "\nwith devices");
for_each_online_member(c, ca, BCH_DEV_READ_REF_bch2_online_devs) {
prt_char(&p, ' ');
prt_str(&p, ca->name);
}
}
bch2_print_str(c, KERN_INFO, p.buf);
}
static bool bch2_fs_may_start(struct bch_fs *c)
@ -1995,6 +1998,22 @@ int bch2_dev_add(struct bch_fs *c, const char *path)
goto err_late;
}
/*
* We just changed the superblock UUID, invalidate cache and send a
* uevent to update /dev/disk/by-uuid
*/
invalidate_bdev(ca->disk_sb.bdev);
char uuid_str[37];
snprintf(uuid_str, sizeof(uuid_str), "UUID=%pUb", &c->sb.uuid);
char *envp[] = {
"CHANGE=uuid",
uuid_str,
NULL,
};
kobject_uevent_env(&ca->disk_sb.bdev->bd_device.kobj, KOBJ_CHANGE, envp);
up_write(&c->state_lock);
out:
printbuf_exit(&label);

View File

@ -262,8 +262,7 @@ static bool string_is_spaces(const char *str)
return true;
}
void bch2_print_string_as_lines(const char *prefix, const char *lines,
bool nonblocking)
void bch2_print_string_as_lines(const char *prefix, const char *lines)
{
bool locked = false;
const char *p;
@ -273,12 +272,7 @@ void bch2_print_string_as_lines(const char *prefix, const char *lines,
return;
}
if (!nonblocking) {
console_lock();
locked = true;
} else {
locked = console_trylock();
}
locked = console_trylock();
while (*lines) {
p = strchrnul(lines, '\n');

View File

@ -214,7 +214,7 @@ u64 bch2_read_flag_list(const char *, const char * const[]);
void bch2_prt_u64_base2_nbits(struct printbuf *, u64, unsigned);
void bch2_prt_u64_base2(struct printbuf *, u64);
void bch2_print_string_as_lines(const char *, const char *, bool);
void bch2_print_string_as_lines(const char *, const char *);
typedef DARRAY(unsigned long) bch_stacktrace;
int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *, unsigned, gfp_t);