commit c8a83a6b54 upstream.
NBD can update block device block size implicitely through
bd_set_size(). Make it explicitely set blocksize with set_blocksize() as
this behavior of bd_set_size() is going away.
CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5db470e229 upstream.
If we don't drop caches used in old offset or block_size, we can get old data
from new offset/block_size, which gives unexpected data to user.
For example, Martijn found a loopback bug in the below scenario.
1) LOOP_SET_FD loads first two pages on loop file
2) LOOP_SET_STATUS64 changes the offset on the loop file
3) mount is failed due to the cached pages having wrong superblock
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reported-by: Martijn Coenen <maco@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628bd85947 upstream.
Commit 0a42e99b58 ("loop: Get rid of loop_index_mutex") forgot to
remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when
replacing loop_index_mutex with loop_ctl_mutex.
Fixes: 0a42e99b58 ("loop: Get rid of loop_index_mutex")
Reported-by: syzbot <syzbot+c0138741c2290fc5e63f@syzkaller.appspotmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c28445fa06 upstream.
The nested acquisition of loop_ctl_mutex (->lo_ctl_mutex back then) has
been introduced by commit f028f3b2f9 "loop: fix circular locking in
loop_clr_fd()" to fix lockdep complains about bd_mutex being acquired
after lo_ctl_mutex during partition rereading. Now that these are
properly fixed, let's stop fooling lockdep.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dded9acf6 upstream.
Code in loop_change_fd() drops reference to the old file (and also the
new file in a failure case) under loop_ctl_mutex. Similarly to a
situation in loop_set_fd() this can create a circular locking dependency
if this was the last reference holding the file open. Delay dropping of
the file reference until we have released loop_ctl_mutex.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0da03cab87 upstream.
Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].
Move call to blkdev_reread_part() in __loop_clr_fd() from under
loop_ctl_mutex to finish fixing of the lockdep warning and the possible
deadlock.
[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588
Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85b0a54a82 upstream.
Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].
Move all calls of loop_rescan_partitions() out of loop_ctl_mutex to
avoid lockdep warning and fix deadlock possibility.
[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588
Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d57f3374ba upstream.
The call of __blkdev_reread_part() from loop_reread_partition() happens
only when we need to invalidate partitions from loop_release(). Thus
move a detection for this into loop_clr_fd() and simplify
loop_reread_partition().
This makes loop_reread_partition() safe to use without loop_ctl_mutex
because we use only lo->lo_number and lo->lo_file_name in case of error
for reporting purposes (thus possibly reporting outdate information is
not a big deal) and we are safe from 'lo' going away under us by
elevated lo->lo_refcnt.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c371077000 upstream.
Push loop_ctl_mutex down to loop_change_fd(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 757ecf40b7 upstream.
Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to
call loop_reread_partitions() without lo_ctl_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 550df5fdac upstream.
Push loop_ctl_mutex down to loop_set_status(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a5ce9ba58 upstream.
Push loop_ctl_mutex down to loop_get_status() to avoid the unusual
convention that the function gets called with loop_ctl_mutex held and
releases it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ccd0791d9 upstream.
loop_clr_fd() has a weird locking convention that is expects
loop_ctl_mutex held, releases it on success and keeps it on failure.
Untangle the mess by moving locking of loop_ctl_mutex into
loop_clr_fd().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2505b799a upstream.
Move setting of lo_state to Lo_rundown out into the callers. That will
allow us to unlock loop_ctl_mutex while the loop device is protected
from other changes by its special state.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a13165441d upstream.
Push acquisition of lo_ctl_mutex down into individual ioctl handling
branches. This is a preparatory step for pushing the lock down into
individual ioctl handling functions so that they can release the lock as
they need it. We also factor out some simple ioctl handlers that will
not need any special handling to reduce unnecessary code duplication.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a42e99b58 upstream.
Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
there is no good reason to keep these two separate and it just
complicates the locking.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 967d1dc144 upstream.
__loop_release() has a single call site. Fold it there. This is
currently not a huge win but it will make following replacement of
loop_index_mutex more obvious.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 310ca162d7 upstream.
syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_ctl_mutex locks.
Since ioctl() request on loop devices is not frequent operation, we don't
need fine grained locking. Let's use global lock in order to allow safe
traversal at loop_validate_file().
Note that syzbot is also reporting circular locking dependency between
bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
blkdev_reread_part() with lock held. This patch does not address it.
[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
[2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1ab5fa309 upstream.
vfs_getattr() needs "struct path" rather than "struct file".
Let's use path_get()/path_put() rather than get_file()/fput().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85f5a4d666 upstream.
There is a window between when RBD_DEV_FLAG_REMOVING is set and when
the device is removed from rbd_dev_list. During this window, we set
"already" and return 0.
Returning 0 from write(2) can confuse userspace tools because
0 indicates that nothing was written. In particular, "rbd unmap"
will retry the write multiple times a second:
10:28:05.463299 write(4, "0", 1) = 0
10:28:05.463509 write(4, "0", 1) = 0
10:28:05.463720 write(4, "0", 1) = 0
10:28:05.463942 write(4, "0", 1) = 0
10:28:05.464155 write(4, "0", 1) = 0
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit de7b75d82f ]
LKP recently reported a hang at bootup in the floppy code:
[ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.
[ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1
[ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.682181] mount D 6372 580 1 0x00000004
[ 245.683023] Call Trace:
[ 245.683425] __schedule+0x2df/0x570
[ 245.683975] schedule+0x2d/0x80
[ 245.684476] schedule_timeout+0x19d/0x330
[ 245.685090] ? wait_for_common+0xa5/0x170
[ 245.685735] wait_for_common+0xac/0x170
[ 245.686339] ? do_sched_yield+0x90/0x90
[ 245.686935] wait_for_completion+0x12/0x20
[ 245.687571] __floppy_read_block_0+0xfb/0x150
[ 245.688244] ? floppy_resume+0x40/0x40
[ 245.688844] floppy_revalidate+0x20f/0x240
[ 245.689486] check_disk_change+0x43/0x60
[ 245.690087] floppy_open+0x1ea/0x360
[ 245.690653] __blkdev_get+0xb4/0x4d0
[ 245.691212] ? blkdev_get+0x1db/0x370
[ 245.691777] blkdev_get+0x1f3/0x370
[ 245.692351] ? path_put+0x15/0x20
[ 245.692871] ? lookup_bdev+0x4b/0x90
[ 245.693539] blkdev_get_by_path+0x3d/0x80
[ 245.694165] mount_bdev+0x2a/0x190
[ 245.694695] squashfs_mount+0x10/0x20
[ 245.695271] ? squashfs_alloc_inode+0x30/0x30
[ 245.695960] mount_fs+0xf/0x90
[ 245.696451] vfs_kern_mount+0x43/0x130
[ 245.697036] do_mount+0x187/0xc40
[ 245.697563] ? memdup_user+0x28/0x50
[ 245.698124] ksys_mount+0x60/0xc0
[ 245.698639] sys_mount+0x19/0x20
[ 245.699167] do_int80_syscall_32+0x61/0x130
[ 245.699813] entry_INT80_32+0xc7/0xc7
showing that we never complete that read request. The reason is that
the completion setup is racy - it initializes the completion event
AFTER submitting the IO, which means that the IO could complete
before/during the init. If it does, we are passing garbage to
complete() and we may sleep forever waiting for the event to
occur.
Fixes: 7b7b68bba5 ("floppy: bail out in open() if drive is not responding to block0 read")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 153fcd5f6d ]
brd_free() may be called in failure path on one brd instance which
disk isn't added yet, so release handler of gendisk may free the
associated request_queue early and causes the following use-after-free[1].
This patch fixes this issue by associating gendisk with request_queue
just before adding disk.
[1] KASAN: use-after-free Read in del_timer_syncNon-volatile memory driver v1.3
Linux agpgart interface v0.103
[drm] Initialized vgem 1.0.0 20120112 for virtual device on minor 0
usbcore: registered new interface driver udl
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
kernel/locking/lockdep.c:3218
Read of size 8 at addr ffff8801d1b6b540 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
__lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
del_timer_sync+0xb7/0x270 kernel/time/timer.c:1283
blk_cleanup_queue+0x413/0x710 block/blk-core.c:809
brd_free+0x5d/0x71 drivers/block/brd.c:422
brd_init+0x2eb/0x393 drivers/block/brd.c:518
do_one_initcall+0x145/0x957 init/main.c:890
do_initcall_level init/main.c:958 [inline]
do_initcalls init/main.c:966 [inline]
do_basic_setup init/main.c:984 [inline]
kernel_init_freeable+0x5c6/0x6b9 init/main.c:1148
kernel_init+0x11/0x1ae init/main.c:1068
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:350
Reported-by: syzbot+3701447012fe951dabb2@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fef912bf86 upstream.
commit 98af4d4df8 upstream.
I got a report from Howard Chen that he saw zram and sysfs race(ie,
zram block device file is created but sysfs for it isn't yet)
when he tried to create new zram devices via hotadd knob.
v4.20 kernel fixes it by [1, 2] but it's too large size to merge
into -stable so this patch fixes the problem by registering defualt
group by Greg KH's approach[3].
This patch should be applied to every stable tree [3.16+] currently
existing from kernel.org because the problem was introduced at 2.6.37
by [4].
[1] fef912bf86, block: genhd: add 'groups' argument to device_add_disk
[2] 98af4d4df8, zram: register default groups with device_add_disk()
[3] http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/
[4] 33863c21e6, Staging: zram: Replace ioctls with sysfs interface
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Tested-by: Howard Chen <howardsoc@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f92898e7f3 upstream.
If a block device is hot-added when we are out of grants,
gnttab_grant_foreign_access fails with -ENOSPC (log message "28
granting access to ring page") in this code path:
talk_to_blkback ->
setup_blkring ->
xenbus_grant_ring ->
gnttab_grant_foreign_access
and the failing path in talk_to_blkback sets the driver_data to NULL:
destroy_blkring:
blkif_free(info, 0);
mutex_lock(&blkfront_mutex);
free_info(info);
mutex_unlock(&blkfront_mutex);
dev_set_drvdata(&dev->dev, NULL);
This results in a NULL pointer BUG when blkfront_remove and blkif_free
try to access the failing device's NULL struct blkfront_info.
Cc: stable@vger.kernel.org # 4.5 and later
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1448a2a536 ]
If we fail to allocate the request queue for a disk, we still need to
free that disk, not just the previous ones. Additionally, we need to
cleanup the previous request queues.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 71327f547e ]
Move queue allocation next to disk allocation to fix a couple of issues:
- If add_disk() hasn't been called, we should clear disk->queue before
calling put_disk().
- If we fail to allocate a request queue, we still need to put all of
the disks, not just the ones that we allocated queues for.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the quest to remove all stack VLA usage from the kernel[1], this moves
the math for cookies calculation into macros and allocates a fixed size
array for the maximum number of cookies and adds a runtime sanity check.
(Note that the size was always fixed, but just hidden from the compiler.)
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAluv6bgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprAFD/9YEm5/YGX8ypqepUISfr7MSspZCl/r0GVv
Oyu14jfTe18ji9mnFzA+g0y/KbNgdyOYstOp9l4V2Pxmt0Hq3KHjiqdbGbICUcDz
uPbwnXWW3Pem6l6JOmT86n14c5irkmB/XezlBEE2n7cVReCOwkycr8VNUkXax/Mu
Wuv1nAX2+uNBGSg0g4H2Y5Dk0fxmQcyKKkVfsz1xa9T2G5sB7gi8XU3+mqXT77hC
BN7aaB306g+gNwGuHp4V6r9eUmSilRHq53qYTKRD8Vtbe2VeVlsmnU8LjFGRuxcN
UZOuEO5MftIt32epi8hEQwWVxoZlaHv5qTjAHjiM77H7+kZGVK7Xv/ZrJHoRRQcI
vIrNKZX0wUtlsC/MmdCcYdqzxgyMJYNc7+Y13W2M/GgXamrOVcjaYWGaxDWfLlIN
jLkFrBK+9XRnvh5o0yKmoL/LXFJ4vXc3T9cvaYN/KTJUhYcfBEDuvfJTPKjbrWkc
iv6ORaLh9hbtUmIJO2yo0ZtLo9vxhegJK1NP6bICo0fJ9iiOrIQpxLiEegWA0ITb
85ot2Iepao5wqNnobSUBdlIKgIt1hECaNVCb3wUvIM7KYZilYP5nYsHg28x9+ZPq
CJZRzYmMBaH+yKo9FkO/+rxGPg6R9Gri5QZRTFAVUcW02I407A8+nlMwQMEBZxqb
80h8OO/ACw==
=8c0k
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180929' of git://git.kernel.dk/linux-block
Jens writes:
"Block fixes for 4.19-rc6
A set of fixes that should go into this release. This pull request
contains:
- A fix (hopefully) for the persistent grants for xen-blkfront. A
previous fix from this series wasn't complete, hence reverted, and
this one should hopefully be it. (Boris Ostrovsky)
- Fix for an elevator drain warning with SMR devices, which is
triggered when you switch schedulers (Damien)
- bcache deadlock fix (Guoju Fang)
- Fix for the block unplug tracepoint, which has had the
timer/explicit flag reverted since 4.11 (Ilya)
- Fix a regression in this series where the blk-mq timeout hook is
invoked with the RCU read lock held, hence preventing it from
blocking (Keith)
- NVMe pull from Christoph, with a single multipath fix (Susobhan Dey)"
* tag 'for-linus-20180929' of git://git.kernel.dk/linux-block:
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
blk-mq: I/O and timer unplugs are inverted in blktrace
bcache: add separate workqueue for journal_write to avoid deadlock
xen/blkfront: When purging persistent grants, keep them in the buffer
block: fix deadline elevator drain for zoned block devices
blk-mq: Allow blocking queue tag iter callbacks
nvme: properly propagate errors in nvme_mpath_init
Commit a46b53672b ("xen/blkfront: cleanup
stale persistent grants") introduced a regression as purged persistent
grants were not pu into the list of free grants again. Correct that.
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit a46b53672b ("xen/blkfront: cleanup stale persistent grants")
added support for purging persistent grants when they are not in use. As
part of the purge, the grants were removed from the grant buffer, This
eventually causes the buffer to become empty, with BUG_ON triggered in
get_free_grant(). This can be observed even on an idle system, within
20-30 minutes.
We should keep the grants in the buffer when purging, and only free the
grant ref.
Fixes: a46b53672b ("xen/blkfront: cleanup stale persistent grants")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlukFZsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpi5MD/424bOq72KPsGNGU6io2c56ecIDItpZiDvq
srDPKN/pdGQ5WZuP2E21Zz+Ry/LP4niUvjZmY88UC9SXx9nxkESyCccxVFGmlGXN
Ttno1l47bu9nvAVPBdZ8brvsODCXk03RiJUDf9NrJsROMB3Xhbo1DtxyjwFwJ/Fr
aokAD3KvFitIbrHhTozxyvzpT55icVu+ykNQypnwdmWSyvebuCvrnQTxRJDNyrgk
wYUB9bhjGaCJzfBT4U7Pi5Is7t16cOSLmw0nv7uo/CUvk1xmSNajxRybCG1IEIFs
eg7Xyo2AFd/+BMwp7cFdOncjLbLzNJW0tJn3bsoJXPVE+MAy2EYYgk760fgqePMV
8afzG6l+obDCefCySj6SMVhXZi3YE640MHWeRBxGJVW5HmWq7Himlt2BUJBvcP72
y6syVoU5gkOTtUun60uF4Lloc0If6MPqvHA5mfDQx6hmQEi/H2XBE0YJxyQoxsZg
8FDq6KMBlmBlxdzfCtOMGNPPS9LIilvgFUpfsR3XzbY5nUhKB4ZZmMCWe9qm6hOl
1yoT6RS1tNV5PL1wQu1cj+d/RHaM7ok0F9FeKhofh1w3NG2x1qajTLNHLb/no7u2
eOJ8mmDxOvJOmU+9H7/Eu4jb2jhpaZPlrX//lxF3VzJcfGwU1wprjsbAjB9mxN1i
4k8CjG0h6w==
=z9pB
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block
Jens writes:
"Storage fixes for 4.19-rc5
- Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
- NVMe pull request from Christoph, and a single ANA log page fix
(Hannes)
- Regression fix for libata qd32 support, where we trigger an illegal
active command transition. This fixes a CD-ROM detection issue that
was reported, but could also trigger premature completion of the
internal tag (me)"
* tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
libata: mask swap internal and hardware tag
nvme: count all ANA groups for ANA Log page
The final field of a floppy_struct is the field "name", which is a pointer
to a string in kernel memory. The kernel pointer should not be copied to
user memory. The FDGETPRM ioctl copies a floppy_struct to user memory,
including this "name" field. This pointer cannot be used by the user
and it will leak a kernel address to user-space, which will reveal the
location of kernel code and data and undermine KASLR protection.
Model this code after the compat ioctl which copies the returned data
to a previously cleared temporary structure on the stack (excluding the
name pointer) and copy out to userspace from there. As we already have
an inparam union with an appropriate member and that memory is already
cleared even for read only calls make use of that as a temporary store.
Based on an initial patch by Brian Belleville.
CVE-2018-7755
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Broke up long line.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlubFpAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoYnEADQYssR8J6leHkYgrAT4zhuhNLALknzHj49
F/s6U3npw1CkUOyNEJ52KScVDEv+bvKIjxXbpVOB1qXqtZWNQ9HkEwAEQWVg11T9
xfbvqDyk/qywfx/cKt+Fko8wcuCX4F4tDX8io9RWeosWg/W5oOPJ8vjh54j0kUXK
xzY947eNorYqjwDYUj56JsVkHObTA8PWg578tgnbYNQtXvJMhWLL4YlfqsbtTMfb
SuQ/AVRhsFPfCMX+0Qj5vu08UwrZ7VDkPKpqUpKfQPqB5VLpKbbCItMt32rCujj9
Ptmo81kEVxZaj9vqqst2wbUp3RVAN5SzcuXoj20GGP2HpWEwBAe8kxgsOVzEgOhx
dNPX2XjfsN0FGN0nPXbYSN0IPeyoxDSgImWRwrdZCaSTXm/96apnzPXNXt0kWh4L
3RP82Y655AtDEfZwVCkUFwqUIgyYm978vgGn/3zf682mBeQImon0AMAJk3QPNj7f
Re8acaPyix39D0+RYfeEHzO9UM1PQ2QF9IX9DwnSrHfoEWEnjPlaEZIbO5fGPKUV
ny+nGoadWqosOUz6/OJuo+MDzE7BlbN25KTN4xcLQlaKxblLJNoPhbgafR1AloTi
DkBXztwAYC9n49D+eiclHplJG0xcWmKyAj3FrINJsW1wkiH2LBjZVC4SX0NJh9Ei
RsFc6NdSDQ==
=1Jo5
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180913' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three fixes that should go into this series. This contains:
- Increase number of policies supported by blk-cgroup.
With blk-iolatency, we now have four in kernel, but we had a hard
limit of three...
- Fix regression in null_blk, where the zoned supported broke
queue_mode=0 (bio based).
- NVMe pull request, with a single fix for an issue in the rdma code"
* tag 'for-linus-20180913' of git://git.kernel.dk/linux-block:
null_blk: fix zoned support for non-rq based operation
blk-cgroup: increase number of supported policies
nvmet-rdma: fix possible bogus dereference under heavy load
The supported added for zones in null_blk seem to assume that only rq
based operation is possible. But this depends on the queue_mode setting,
if this is set to 0, then cmd->bio is what we need to be operating on.
Right now any attempt to load null_blk with queue_mode=0 will
insta-crash, since cmd->rq is NULL and null_handle_cmd() assumes it to
always be set.
Make the zoned code deal with bio's instead, or pass in the
appropriate sector/nr_sectors instead.
Fixes: ca4b2a0119 ("null_blk: add zone support")
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAluSrJYTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi/N8B/4sZzRCJMCejvU/yRq91NlaPDrxbVHh
nfICZ/8Fsy/fmvK8NWNyHcCIWx+nWrbCvCJMj0fxWMhk/1t75yC+TdyCJnyuhsQU
V/CPTs9BTdwrSUiTB83/n/ukGL6mpESk0CQ1er/l1EO6FnNOXvgzHDnCqUQZLdzU
1aRcx5JQWWo/QlCmzt2KWENhfQRMvLAtf04F5cUuR+JTrMjwWia6MAuRGuOhVQkW
XIlFNakBKab89Vod1pmA7BrG/+sHXCpVGX6sjAp9vQUWO3WWKBRnNtVwo9dPSHah
hBR8IzOkihw7HfTlINWVpiR69nTfM80PQHXJkFSp36E6Sfq8EShRpFIZ
=pga5
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd patches to complete support for images within namespaces that
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week"
* tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client:
rbd: support cloning across namespaces
rbd: factor out get_parent_info()
ceph: avoid a use-after-free in ceph_destroy_options()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAluRkywQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm8uEAC8vBFb5tzZ2dOeRbGQ6LaPTToBmRrLtOcP
kDRnfZIw0raNStOpn1dkGLz8IOSjwOGftx9Q4pJed25vynTEq5lYmmLVUlJQ6cJ7
oNpYiCdPxJvbKz5fChGG2nHHa1RLer1d728NZtkeZU/ChPmw56EO5ORghE7zPG7K
Z/0qHYsgwS427o8pUDsymmt6I62IJGrjzqJdC0pqBy6RePQWtlwkmtd7CIgFiffY
tDnk6RSwcihnIalMMLvFXeGf6cSaZvuH4oK1QNdfojAyS8kWeA6gHtjRS8UcuuUY
t6o+hU0vki8bghoNoI40RrLgAmV91BVv1/Voo79dQvDWAigyie51HwFFkqdWzJxJ
g4MCZYpys26w/VUGBFCku0hiRIAhZFO8Sun5zbVCJpyt8hTXF0RrG3CpwmCF7Lc0
m+h8tJanEMCesfYMztTD31L1BOFhJeOgBJr4a5QURy0LbIvC0V52IKiOQ0475E8E
H10rQaRw/7Am+mZugedMUGMgYD/eN33NQoRuTWZdck/58big2SU78zGpR/GqTmy3
w9v2I8ksBTivzEayBV0G4Z5Gxu7QYA7NMsO5RS/wuGfUX8D/1QtQU9Ejh5TESbek
R3WUyhXJJ2S+DWTUlmX7TgPxYxG3sXatQbSAgFJiucxyIRdpdqfeoXmOHvPrWZEq
O3VDm0D6pw==
=qhv7
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Small collection of fixes that should go into this release. This
contains:
- Small series that fixes a race between blkcg teardown and writeback
(Dennis Zhou)
- Fix disallowing invalid block size settings from the nbd ioctl (me)
- BFQ fix for a use-after-free on last release of a bfqg (Konstantin
Khlebnikov)
- Fix for the "don't warn for flush" fix (Mikulas)"
* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
block: bfq: swap puts in bfqg_and_blkg_put
block: don't warn when doing fsync on read-only devices
nbd: don't allow invalid blocksize settings
blkcg: use tryget logic when associating a blkg with a bio
blkcg: delay blkg destruction until after writeback has finished
Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
If parent_get class method is not supported by the OSDs, fall back to
the legacy class method and assume that the parent is in the default
(i.e. "") namespace. The "use the child's image namespace" workaround
is no longer needed because creating images within namespaces will
require parent_get aware OSDs.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
In preparation for the new parent_get and parent_overlap_get class
methods, factor out the fetching and decoding of parent data.
As a side effect, we now decode all four fields in the "no parent"
case.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
We need proper validation of the input here. Not just if it's
zero, but also if the value is a power-of-2 and in a valid
range. Add that.
Cc: stable@vger.kernel.org
Reported-by: syzbot <syzbot+25dbecbec1e62c6b0dd4@syzkaller.appspotmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAluITzAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnlzD/0bDvKP73KLRJhqYQSKeRU98gcZmr6FglsH
U/XohBRTu0q/5KEVru4YC/44XLaUzK/WJ+mq/IGPqiCmH9YF2nqgD56WN+KL7hCe
4YhmpyuwoR0iyTpY1qnKJwkS7ymd1IQCWW83c3dQ3vWeturGNg0X2ueuSTl2N+8N
2g+6/M80fVycHCBT8ewvSihDMLwfPVdwMyg8xVzSCclO9MLGN714ag9NDM7aN9vf
QHu8vdRPtIwj/0ZQ8ttLTF/2k3t6CUHzvbN/9OWQ+8gFPF/ASop87Dg3P1DBmkj3
RFrlg0QzMzJyBeRtmUlT83Cka7KzOONscJyZPTxJwZrudtgP+xye6ArOP0oKePyn
9HGCcqsnIY05mifx9LXxWRdG1R2M7av47V5qAs9wJwP1bwijhpLXErs/6k8gJTCX
rr0/5AirAJbRG73P0wkU0aiaTZVIyIS5f9TLpNJZ6EAnRnaE9R1t4gAzDLl4t4Jg
iGKz8GKlzWapdU00kEs4Jq2wpA39HAn5cClsbbOaPFVoKQzZfzE9+hQpvx6xfCxP
K07ky2JhoelNwqfOQ7EiTuBSv+jeV2TheUhu27rC3IHwq+kFk2nzzM9xvk+mXO5I
B7v+yqBjhcJiO2799WwIlpkPkQR1vZJFmMe8HEw7QaX+B5jncdhyjP/BguntWgMT
LCQb4x5trQ==
=bnq/
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180830' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Small collection of fixes that should go into this series. This pull
contains:
- NVMe pull request with three small fixes (via Christoph)
- Kill useless NULL check before kmem_cache_destroy (Chengguang Xu)
- Xen block driver pull request with persistent grant flushing fixes
(Juergen Gross)
- Final wbt fixes, wrapping up the changes for this series. These
have been heavily tested (me)
- cdrom info leak fix (Scott Bauer)
- ATA dma quirk for SQ201 (Linus Walleij)
- Straight forward bsg refcount_t conversion (John Pittman)"
* tag 'for-linus-20180830' of git://git.kernel.dk/linux-block:
cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status
nvmet: free workqueue object if module init fails
nvme-fcloop: Fix dropped LS's to removed target port
nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event
block: bsg: move atomic_t ref_count variable to refcount API
block: remove unnecessary condition check
ata: ftide010: Add a quirk for SQ201
blk-wbt: remove dead code
blk-wbt: improve waking of tasks
blk-wbt: abstract out end IO completion handler
xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring
xen/blkback: move persistent grants flags to bool
xen/blkfront: reorder tests in xlblk_init()
xen/blkfront: cleanup stale persistent grants
xen/blkback: don't keep persistent grants too long
Pull Xen block driver fixes from Konrad:
"Fix for flushing out persistent pages at a deterministic rate"
* 'stable/for-jens-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring
xen/blkback: move persistent grants flags to bool
xen/blkfront: reorder tests in xlblk_init()
xen/blkfront: cleanup stale persistent grants
xen/blkback: don't keep persistent grants too long
pers_gnts_lock isn't being used anywhere. Remove it.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The struct persistent_gnt flags member is meant to be a bitfield of
different flags. There is only PERSISTENT_GNT_ACTIVE flag left, so
convert it to a bool named "active".
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In case we don't want pv block devices we should not test parameters
for sanity and eventually print out error messages. So test precluding
conditions before checking parameters.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add a periodic cleanup function to remove old persistent grants which
are no longer in use on the backend side. This avoids starvation in
case there are lots of persistent grants for a device which no longer
is involved in I/O business.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Persistent grants are allocated until a threshold per ring is being
reached. Those grants won't be freed until the ring is being destroyed
meaning there will be resources kept busy which might no longer be
used.
Instead of freeing only persistent grants until the threshold is
reached add a timestamp and remove all persistent grants not having
been in use for a minute.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull IDA updates from Matthew Wilcox:
"A better IDA API:
id = ida_alloc(ida, GFP_xxx);
ida_free(ida, id);
rather than the cumbersome ida_simple_get(), ida_simple_remove().
The new IDA API is similar to ida_simple_get() but better named. The
internal restructuring of the IDA code removes the bitmap
preallocation nonsense.
I hope the net -200 lines of code is convincing"
* 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
ida: Change ida_get_new_above to return the id
ida: Remove old API
test_ida: check_ida_destroy and check_ida_alloc
test_ida: Convert check_ida_conv to new API
test_ida: Move ida_check_max
test_ida: Move ida_check_leaf
idr-test: Convert ida_check_nomem to new API
ida: Start new test_ida module
target/iscsi: Allocate session IDs from an IDA
iscsi target: fix session creation failure handling
drm/vmwgfx: Convert to new IDA API
dmaengine: Convert to new IDA API
ppc: Convert vas ID allocation to new IDA API
media: Convert entity ID allocation to new IDA API
ppc: Convert mmu context allocation to new IDA API
Convert net_namespace to new IDA API
cb710: Convert to new IDA API
rsxx: Convert to new IDA API
osd: Convert to new IDA API
sd: Convert to new IDA API
...