linux/drivers/md
colyli@suse.de 5a1f03f1ee md linear: fix a race between linear_add() and linear_congested()
commit 03a9e24ef2 upstream.

Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev->private
 1   oldconf=mddev->private
 2   mddev->raid_disks++
 3                              for (i=0; i<mddev->raid_disks;i++)
 4                                bdev_get_queue(conf->disks[i].rdev->bdev)
 5   mddev->private=newconf

In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev->raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf->disks[] in linear_congested(), use conf->raid_disks to
    replace mddev->raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev->private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc18 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:37:30 +01:00
..
bcache bcache: Make gc wakeup sane, remove set_task_state() 2017-02-23 17:43:10 +01:00
persistent-data dm space map metadata: fix 'struct sm_metadata' leak on failed create 2017-01-06 11:16:15 +01:00
bitmap.c md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-bufio.h
dm-builtin.c
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: fix cmd_read_lock() acquiring write lock 2016-05-04 14:48:41 -07:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-04-12 09:08:40 -07:00
dm-cache-policy-cleaner.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy-smq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy.c
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c dm cache: fix corruption seen when using cache > 2TB 2017-03-12 06:37:26 +01:00
dm-crypt.c dm crypt: mark key as invalid until properly loaded 2017-01-06 11:16:15 +01:00
dm-delay.c dm delay: document that offsets are specified in sectors 2015-10-31 19:06:05 -04:00
dm-era-target.c dm persistent data: eliminate unnecessary return values 2015-10-31 19:06:02 -04:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-flakey.c dm flakey: return -EINVAL on interval bounds error in flakey_ctr() 2017-01-06 11:16:15 +01:00
dm-io.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-ioctl.c char: make misc_deregister a void function 2015-08-05 10:35:49 -07:00
dm-kcopyd.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
dm-linear.c dm linear: remove redundant target name from error messages 2015-10-31 19:06:03 -04:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix bug with too large bios 2016-10-07 15:23:47 +02:00
dm-log.c
dm-mpath.c dm mpath: check if path's request_queue is dying in activate_path() 2016-10-28 03:01:28 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-raid1.c dm mirror: fix read error on recovery after default leg failure 2016-11-10 16:36:35 +01:00
dm-region-hash.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-stats.c dm stats: fix a leaked s->histogram_boundaries array 2017-03-12 06:37:26 +01:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr 2015-04-15 12:10:17 -04:00
dm-table.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-target.c
dm-thin-metadata.c dm thin metadata: don't issue prefetches if a transaction abort has failed 2016-04-12 09:08:40 -07:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: fix race condition when destroying thin pool workqueue 2016-03-03 15:07:10 -08:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: free io_barrier after blk_cleanup_queue call 2016-11-10 16:36:34 +01:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
Kconfig raid5-cache: add crc32c Kconfig dependency 2015-11-09 09:09:52 +11:00
linear.c md linear: fix a race between linear_add() and linear_congested() 2017-03-12 06:37:30 +01:00
linear.h md linear: fix a race between linear_add() and linear_congested() 2017-03-12 06:37:30 +01:00
Makefile raid5: add basic stripe log 2015-10-24 17:16:19 +11:00
md-cluster.c md-cluster: remove mddev arg from add_resync_info() 2015-10-24 17:16:18 +11:00
md-cluster.h md-cluster: Fix adding of new disk with new reload code 2015-10-12 03:35:30 -05:00
md.c md: MD_RECOVERY_NEEDED is set for mddev->recovery 2017-01-12 11:22:50 +01:00
md.h md/raid: only permit hot-add of compatible integrity profiles 2016-02-17 12:30:57 -08:00
multipath.c md: multipath: don't hardcopy bio in .make_request path 2016-04-12 09:08:57 -07:00
multipath.h
raid1.c raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang 2016-04-12 09:08:56 -07:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5-cache: start raid5 readonly if journal is missing 2015-11-01 13:48:29 +11:00
raid5.c md/raid5: limit request size according to implementation limits 2017-01-09 08:07:49 +01:00
raid5.h RAID5: revert e9e4c377e2 to fix a livelock 2016-04-12 09:08:57 -07:00
raid10.c raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang 2016-04-12 09:08:57 -07:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00
raid0.c md/raid0: apply base queue limits *before* disk_stack_limits 2015-10-02 17:23:44 +10:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00