linux/include
NeilBrown f466722ca6 md: Change handling of save_raid_disk and metadata update during recovery.
Since commit d70ed2e4fa
   MD: Allow restarting an interrupted incremental recovery.

we don't write out the metadata to devices while they are recovering.
This had a good reason, but has unfortunate consequences.  This patch
changes things to make them work better.

At issue is what happens if the array is shut down while a recovery is
happening, particularly a bitmap-guided recovery.
Ideally the recovery should pick up where it left off.
However the metadata cannot represent the state "A recovery is in
process which is guided by the bitmap".

Before the above mentioned commit, we wrote metadata to the device
which said "this is being recovered and it is up to <here>".  So after
a restart, a full recovery (not bitmap-guided) would happen from
where-ever it was up to.

After the commit the metadata wasn't updated so it still said "This
device is fully in sync with <this> event count".  That leads to a
bitmap-based recovery following the whole bitmap, which should be a
lot less work than a full recovery from some starting point.  So this
was an improvement.

However updates some metadata but not all leads to other problems.
In particular, the metadata written to the fully-up-to-date device
record that the array has all devices present (even though some are
recovering).  So on restart, mdadm wants to find all devices and
expects them to have current event counts.
Obviously it doesn't (some have old event counts) so (when assembling
with --incremental) it waits indefinitely for the rest of the expected
devices.

It really is wrong to not update all the metadata together.  Do that
is bound to cause confusion.
Instead, we should make it possible to record the truth in the
metadata.  i.e. we need to be able to record that a device is being
recovered based on the bitmap.
We already have a Feature flag to say that recovery is happening.  We
now add another one to say that it is a bitmap-based recovery.

With this we can remove the code that disables the write-out of
metadata on some devices.

So this patch:
 - moves the setting of 'saved_raid_disk' from add_new_disk to
   the validate_super methods.  This makes sure it is always set
   properly, both when adding a new device to an array, and when
   assembling an array from a collection of devices.
 - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
   used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
   bitmap-based recovery is allowed.
   This is only present in v1.x metadata. v0.90 doesn't support
   devices which are in the middle of recovery at all.
 - Only skips writing metadata to Faulty devices.

 - Also allows rdev state to be set to "-insync" via sysfs.
   This can be used for external-metadata arrays.  When the
   'role' is set the device is assumed to be in-sync.  If, after
   setting the role, we set the state to "-insync", the role is
   moved to saved_raid_disk which effectively says the device is
   partly in-sync with that slot and needs a bitmap recovery.

Cc: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
..
acpi ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node 2013-11-14 23:14:43 +01:00
asm-generic Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
clocksource drivers: clocksource: add support for ARM architected timer event stream 2013-09-26 09:48:00 +01:00
crypto keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
drm Merge branch 'ttm-fixes-3.13' of git://people.freedesktop.org/~thomash/linux into drm-fixes 2013-11-21 18:46:56 +10:00
dt-bindings For the 3.13 merge window we have a couple of new drivers for the AMS 2013-11-15 16:37:40 -08:00
keys KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
kvm
linux PCI updates for v3.13: 2013-11-22 10:53:47 -08:00
math-emu
media Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2013-11-18 15:50:07 -08:00
memory
misc
net genetlink: fix genl_set_err() group ID 2013-11-21 13:09:43 -05:00
pcmcia
ras
rdma Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
rxrpc
scsi Main batch of InfiniBand/RDMA changes for 3.13: 2013-11-18 15:36:04 -08:00
sound Merge remote-tracking branch 'asoc/topic/twl4030' into asoc-next 2013-11-08 10:43:40 +00:00
target target_core_alua: Store supported ALUA states 2013-11-20 11:26:37 -08:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-11-22 08:38:55 -08:00
uapi md: Change handling of save_raid_disk and metadata update during recovery. 2014-01-14 16:44:21 +11:00
video fbdev changes for 3.13 2013-11-14 14:44:20 +09:00
xen Features: 2013-11-15 13:34:37 +09:00
Kbuild