Commit Graph

12672 Commits

Author SHA1 Message Date
Suganath Prabu S
6eddf5c993 scsi: mpt3sas: Unblock device after controller reset
commit 7ff723ad0f upstream.

While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.

Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.

[mkp: clarified patch description]

Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-02 09:09:02 +01:00
Andrey Grodzovsky
ffffc1ed47 scsi: mpt3sas: Fix secure erase premature termination
commit 18f6084a98 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-02 09:09:01 +01:00
Sreekanth Reddy
d245874049 scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
commit 6d3a56ed09 upstream.

While merging mpt3sas & mpt2sas code, we added the is_warpdrive check
condition on the wrong line

---------------------------------------------------------------------------
 scsih_target_alloc(struct scsi_target *starget)
                        sas_target_priv_data->handle = raid_device->handle;
                        sas_target_priv_data->sas_address = raid_device->wwid;
                        sas_target_priv_data->flags |= MPT_TARGET_FLAGS_VOLUME;
-                       raid_device->starget = starget;
+                       sas_target_priv_data->raid_device = raid_device;
+                       if (ioc->is_warpdrive)
+                               raid_device->starget = starget;
                }
                spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
                return 0;
------------------------------------------------------------------------------

That check should be for the line sas_target_priv_data->raid_device =
raid_device;

Due to above hunk, we are not initializing raid_device's starget for
raid volumes, and so during raid disk deletion driver is not calling
scsi_remove_target() API as driver observes starget field of
raid_device's structure as NULL.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Fixes: 7786ab6aff ("mpt3sas: Ported WarpDrive product SSS6200 support")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:36 +01:00
Bill Kuzeja
6e897d034d scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
commit a5dd506e15 upstream.

A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.

In a nutshell, since commit beb9e315e6 ("qla2xxx: Prevent removal and
board_disable race"):

...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:

       if (!atomic_read(&pdev->enable_cnt)) {
               scsi_host_put(base_vha->host);
               kfree(ha);
               pci_set_drvdata(pdev, NULL);
               return;

Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good.  However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:

        if (time > vha->hw->loop_reset_delay * HZ)
                return 1;

The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.

This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.

The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point).  If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.

Fixes: beb9e315e6 ("qla2xxx: Prevent removal and board_disable race")
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:35 +01:00
Sumit Saxena
ae94da4c53 scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
commit 5e5ec1759d upstream.

This patch will fix regression caused by commit 1e793f6fc0 ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").

The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).

[mkp: clarified patch description]

Fixes: 1e793f6fc0
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Tested-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15 07:46:40 +01:00
Ching Huang
c77a234622 scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
commit 2bf7dc8443 upstream.

The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.

Ensure that cache flushes are passed to the controller.

[mkp: applied by hand and removed unused vars]

Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10 16:36:35 +01:00
Ewan D. Milne
69ee0ed0c6 scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
commit 4d2b496f19 upstream.

map_storep was not being vfree()'d in the module_exit call.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10 16:36:35 +01:00
Kashyap Desai
9075faf140 scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
commit 1e793f6fc0 upstream.

Commit 02b01e010a ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.

[mkp: clarified patch description]

Fixes: 02b01e010a
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10 16:36:35 +01:00
Johannes Thumshirn
2577121578 mpt3sas: Don't spam logs if logging level is 0
commit 0d667f72b2 upstream.

In _scsih_io_done() we test if the ioc->logging_level does _not_ have
the MPT_DEBUG_REPLY bit set and if it hasn't we print the debug
messages. This unfortunately is the wrong way around.

Note, the actual bug is older than af0094115 but this commit removed the
CONFIG_SCSI_MPT3SAS_LOGGING Kconfig option which hid the bug.

Fixes: af0094115 'mpt2sas, mpt3sas: Remove SCSI_MPTXSAS_LOGGING entry from Kconfig'
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31 04:14:01 -06:00
Don Brace
652a174a7d hpsa: correct skipping masked peripherals
commit 64ce60cab2 upstream.

The SA controller spins down RAID drive spares.

A REGNEWD event causes an inquiry to be sent to all physical
drives. This causes the SA controller to spin up the spare.

The controller suspends all I/O to a logical volume until
the spare is spun up. The spin-up can take over 50 seconds.

This can result in one or both of the following:
 - SML sends down aborts and resets to the logical volume
   and can cause the logical volume to be off-lined.
 - a negative impact on the logical volume's I/O performance
   each time a REGNEWD is triggered.

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:33 -04:00
Martin K. Petersen
9814eb7549 sd: Fix rw_max for devices that report an optimal xfer size
commit 6b7e9cde49 upstream.

For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.

Fixes: d0eb20a863 ("sd: Optimal I/O size is in bytes, not sectors")
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:33 -04:00
Ming Lei
bffff9301e scsi: Fix use-after-free
commit bcd8f2e948 upstream.

This patch fixes one use-after-free report[1] by KASAN.

In __scsi_scan_target(), when a type 31 device is probed,
SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned
again.

Inside the following scsi_report_lun_scan(), one new scsi_device
instance is allocated, and scsi_probe_and_add_lun() is called again to
probe the target and still see type 31 device, finally
__scsi_remove_device() is called to remove & free the device at the end
of scsi_probe_and_add_lun(), so cause use-after-free in
scsi_report_lun_scan().

And the following SCSI log can be observed:

	scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
	scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
	scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
	scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0)
	scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
	scsi 0:0:2:0: scsi scan: REPORT LUN scan
	scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
	scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
	scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
	BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104

This patch fixes the issue by moving the putting reference at
the end of scsi_report_lun_scan().

[1] KASAN report
==================================================================
[    3.274597] PM: Adding info for serio:serio1
[    3.275127] BUG: KASAN: use-after-free in __scsi_scan_target+0xd87/0xdf0 at addr ffff880254d8c304
[    3.275653] Read of size 4 by task kworker/u10:0/27
[    3.275903] CPU: 3 PID: 27 Comm: kworker/u10:0 Not tainted 4.8.0 #2121
[    3.276258] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    3.276797] Workqueue: events_unbound async_run_entry_fn
[    3.277083]  ffff880254d8c380 ffff880259a37870 ffffffff94bbc6c1 ffff880078402d80
[    3.277532]  ffff880254d8bb80 ffff880259a37898 ffffffff9459fec1 ffff880259a37930
[    3.277989]  ffff880254d8bb80 ffff880078402d80 ffff880259a37920 ffffffff945a0165
[    3.278436] Call Trace:
[    3.278528]  [<ffffffff94bbc6c1>] dump_stack+0x65/0x84
[    3.278797]  [<ffffffff9459fec1>] kasan_object_err+0x21/0x70
[    3.279063] device: 'psaux': device_add
[    3.279616]  [<ffffffff945a0165>] kasan_report_error+0x205/0x500
[    3.279651] PM: Adding info for No Bus:psaux
[    3.280202]  [<ffffffff944ecd22>] ? kfree_const+0x22/0x30
[    3.280486]  [<ffffffff94bc2dc9>] ? kobject_release+0x119/0x370
[    3.280805]  [<ffffffff945a0543>] __asan_report_load4_noabort+0x43/0x50
[    3.281170]  [<ffffffff9507e1f7>] ? __scsi_scan_target+0xd87/0xdf0
[    3.281506]  [<ffffffff9507e1f7>] __scsi_scan_target+0xd87/0xdf0
[    3.281848]  [<ffffffff9507d470>] ? scsi_add_device+0x30/0x30
[    3.282156]  [<ffffffff94f7f660>] ? pm_runtime_autosuspend_expiration+0x60/0x60
[    3.282570]  [<ffffffff956ddb07>] ? _raw_spin_lock+0x17/0x40
[    3.282880]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.283200]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.283563]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.283882]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.284173]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.284492]  [<ffffffff941a8954>] ? pwq_dec_nr_in_flight+0x124/0x2a0
[    3.284876]  [<ffffffff941d1770>] ? preempt_count_add+0x130/0x160
[    3.285207]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.285526]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.285844]  [<ffffffff941aa810>] ? process_one_work+0x12d0/0x12d0
[    3.286182]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.286443]  [<ffffffff940855cd>] ? __switch_to+0x88d/0x1430
[    3.286745]  [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[    3.287085]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.287368]  [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[    3.287697] Object at ffff880254d8bb80, in cache kmalloc-2048 size: 2048
[    3.288064] Allocated:
[    3.288147] PID = 27
[    3.288218]  [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[    3.288531]  [<ffffffff9459f246>] save_stack+0x46/0xd0
[    3.288806]  [<ffffffff9459f4bd>] kasan_kmalloc+0xad/0xe0
[    3.289098]  [<ffffffff9459c07e>] __kmalloc+0x13e/0x250
[    3.289378]  [<ffffffff95078e5a>] scsi_alloc_sdev+0xea/0xcf0
[    3.289701]  [<ffffffff9507de76>] __scsi_scan_target+0xa06/0xdf0
[    3.290034]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.290362]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.290724]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.291055]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.291354]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.291695]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.292022]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.292325]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.292594]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.292886] Freed:
[    3.292945] PID = 27
[    3.293016]  [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[    3.293327]  [<ffffffff9459f246>] save_stack+0x46/0xd0
[    3.293600]  [<ffffffff9459fa61>] kasan_slab_free+0x71/0xb0
[    3.293916]  [<ffffffff9459bac2>] kfree+0xa2/0x1f0
[    3.294168]  [<ffffffff9508158a>] scsi_device_dev_release_usercontext+0x50a/0x730
[    3.294598]  [<ffffffff941ace9a>] execute_in_process_context+0xda/0x130
[    3.294974]  [<ffffffff9508107c>] scsi_device_dev_release+0x1c/0x20
[    3.295322]  [<ffffffff94f566f6>] device_release+0x76/0x1e0
[    3.295626]  [<ffffffff94bc2db7>] kobject_release+0x107/0x370
[    3.295942]  [<ffffffff94bc29ce>] kobject_put+0x4e/0xa0
[    3.296222]  [<ffffffff94f56e17>] put_device+0x17/0x20
[    3.296497]  [<ffffffff9505201c>] scsi_device_put+0x7c/0xa0
[    3.296801]  [<ffffffff9507e1bc>] __scsi_scan_target+0xd4c/0xdf0
[    3.297132]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.297458]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.297829]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.298156]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.298453]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.298777]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.299105]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.299408]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.299676]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.299967] Memory state around the buggy address:
[    3.300209]  ffff880254d8c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.300608]  ffff880254d8c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.300986] >ffff880254d8c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.301408]                    ^
[    3.301550]  ffff880254d8c380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    3.301987]  ffff880254d8c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    3.302396]
==================================================================

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:31 -04:00
Brian King
2ed1b50a40 scsi: ibmvfc: Fix I/O hang when port is not mapped
commit 07d0e9a847 upstream.

If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests.  This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22 12:26:56 +02:00
Borislav Petkov
161cbfec10 scsi: arcmsr: Simplify user_len checking
commit 4bd173c307 upstream.

Do the user_len check first and then the ver_addr allocation so that we
can save us the kfree() on the error path when user_len is >
ARCMSR_API_DATA_BUFLEN.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Marco Grassi <marco.gra@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22 12:26:56 +02:00
Dan Carpenter
2404092282 scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
commit 7bc2b55a5c upstream.

We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.

Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22 12:26:55 +02:00
Dan Carpenter
07dc725268 fnic: pci_dma_mapping_error() doesn't return an error code
commit dd7328e4c5 upstream.

pci_dma_mapping_error() returns true on error and false on success.

Fixes: fd6ddfa4c1 ('fnic: check pci_map_single() return value')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:45 +02:00
Maurizio Lombardi
56e5ad1e1d megaraid: fix null pointer check in megasas_detach_one().
commit 546e559c79 upstream.

The pd_seq_sync pointer can't be NULL, we have to check its entries
instead.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
Tyrel Datwyler
6f0caecda5 scsi: fix upper bounds check of sense key in scsi_sense_key_string()
commit a87eeb900d upstream.

Commit 655ee63cf3 ("scsi constants: command, sense key + additional
sense string") added a "Completed" sense string with key 0xF to
snstext[], but failed to updated the upper bounds check of the sense key
in scsi_sense_key_string().

Fixes: 655ee63cf3 ("[SCSI] scsi constants: command, sense key + additional sense strings")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:54 +02:00
Manoj N. Kumar
d4009e4b6e cxlflash: Move to exponential back-off when cmd_room is not available
[ Upstream commit ea76543127 ]

While profiling the cxlflash_queuecommand() path under a heavy load it
was found that number of retries to find cmd_room was fairly high.

There are two problems with the current back-off:
a) It starts with a udelay of 0
b) It backs-off linearly

Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
2^n) and found that the exponential back-off(2^n) approach had the least
overall cost. Cost as being defined as overall time spent waiting.

The fix is to change the linear back-off to an exponential back-off.
This solution also takes care of the problem with the initial
delay (starts with 1 usec).

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:50 +02:00
Matthew R. Ochs
f88503578d cxlflash: Fix to avoid virtual LUN failover failure
[ Upstream commit d5e26bb1d8 ]

Applications which use virtual LUN's that are backed by a physical LUN
over both adapter ports may experience an I/O failure in the event of a
link loss (e.g. cable pull).

Virtual LUNs may be accessed through one or both ports of the adapter.
This access is encoded in the translation entries that comprise the
virtual LUN and used by the AFU for load-balancing I/O and handling
failover scenarios. In a link loss scenario, even though the AFU is able
to maintain connectivity to the LUN, it is up to the application to
retry the failed I/O. When applications are unaware of the virtual LUN's
underlying topology, they are unable to make a sound decision of when to
retry an I/O and therefore are forced to make their reaction to a failed
I/O absolute. The result is either a failure to retry I/O or increased
latency for scenarios where a retry is pointless.

To remedy this scenario, provide feedback back to the application on
virtual LUN creation as to which ports the LUN may be accessed. LUN's
spanning both ports are candidates for a retry in a presence of an I/O
failure.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Manoj Kumar
610f1a8d37 cxlflash: Fix to escalate LINK_RESET also on port 1
[ Upstream commit a9be294ecb ]

The original fix to escalate a 'login timed out' error to a LINK_RESET
was only made for one of the two ports on the card. This fix resolves
the same issue for the second port (port 1).

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
James Smart
9f4a5a1c0c lpfc: Fix DMA faults observed upon plugging loopback connector
[ Upstream commit ae09c76510 ]

Driver didn't program the REG_VFI mailbox correctly, giving the adapter
bad addresses.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:48 +02:00
Manoj N. Kumar
d774bfcec0 cxlflash: Fix to resolve dead-lock during EEH recovery
[ Upstream commit 635f6b0893 ]

When a cxlflash adapter goes into EEH recovery and multiple processes
(each having established its own context) are active, the EEH recovery
can hang if the processes attempt to recover in parallel. The symptom
logged after a couple of minutes is:

INFO: task eehd:48 blocked for more than 120 seconds.
Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd            0    48      2
Call Trace:
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_write_failed+0x294/0x410
down_write+0x88/0xb0
cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
cxl_vphb_error_detected+0x88/0x110 [cxl]
cxl_pci_error_detected+0xb0/0x1d0 [cxl]
eeh_report_error+0xbc/0x130
eeh_pe_dev_traverse+0x94/0x160
eeh_handle_normal_event+0x17c/0x450
eeh_handle_event+0x184/0x370
eeh_event_handler+0x1c8/0x1d0
kthread+0x110/0x130
ret_from_kernel_thread+0x5c/0xa4
INFO: task blockio:33215 blocked for more than 120 seconds.

Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blockio         0 33215  33213
Call Trace:
0x1 (unreliable)
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_read_failed+0x124/0x1d0
down_read+0x68/0x80
cxlflash_ioctl+0x70/0x6f0 [cxlflash]
scsi_ioctl+0x3b0/0x4c0
sg_ioctl+0x960/0x1010
do_vfs_ioctl+0xd8/0x8c0
SyS_ioctl+0xd4/0xf0
system_call+0x38/0xb4
INFO: task eehd:48 blocked for more than 120 seconds.

The hang is because of a 3 way dead-lock:

Process A holds the recovery mutex, and waits for eehd to complete.
Process B holds the semaphore and waits for the recovery mutex.
eehd waits for semaphore.

The fix is to have Process B above release the semaphore before
attempting to acquire the recovery mutex. This will allow
eehd to proceed to completion.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:47 +02:00
Manoj N. Kumar
321aeea695 cxlflash: Fix to avoid unnecessary scan with internal LUNs
[ Upstream commit 603ecce95f ]

When switching to the internal LUN defined on the
IBM CXL flash adapter, there is an unnecessary
scan occurring on the second port. This scan leads
to the following extra lines in the log:

Dec 17 10:09:00 tul83p1 kernel: [ 3708.561134] cxlflash 0008:00:00.0: cxlflash_queuecommand: (scp=c0000000fc1f0f00) 11/1/0/0 cdb=(A0000000-00000000-10000000-00000000)
Dec 17 10:09:00 tul83p1 kernel: [ 3708.561147] process_cmd_err: cmd failed afu_rc=32 scsi_rc=0 fc_rc=0 afu_extra=0xE, scsi_extra=0x0, fc_extra=0x0

By definition, both of the internal LUNs are on the first port/channel.

When the lun_mode is switched to internal LUN the
same value for host->max_channel is retained. This
causes an unnecessary scan over the second port/channel.

This fix alters the host->max_channel to 0 (1 port), if internal
LUNs are configured and switches it back to 1 (2 ports) while
going back to external LUNs.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:46 +02:00
Ching Huang
def391c897 arcmsr: fixes not release allocated resource
[ Upstream commit 98f90debc2 ]

Releasing allocated resource if get configuration data failed.

Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:45 +02:00
Ching Huang
c77d6c3c88 arcmsr: fixed getting wrong configuration data
[ Upstream commit 251e2d25bf ]

Fixed getting wrong configuration data of adapter type B and type D.

Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:45 +02:00
Swapnil Nagle
07235140a4 qla2xxx: Use ATIO type to send correct tmr response
[ Upstream commit d7236ac368 ]

The function value inside se_cmd can change if the TMR is cancelled.
Use original ATIO Type to correctly determine CTIO response.

Signed-off-by: Swapnil Nagle <swapnil.nagle@purestroage.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:44 +02:00
Suganath prabu Subramani
116a7584cb mpt3sas: Fix for Asynchronous completion of timedout IO and task abort of timedout IO.
[ Upstream commit 03d1fb3a65 ]

Track msix of each IO and use the same msix for issuing abort to timed
out IO. With this driver will process IO's reply first followed by TM.

Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com>
Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:43 +02:00
Tomas Henzl
8b208a5d0a mpt3sas: A correction in unmap_resources
[ Upstream commit 5f985d88ba ]

It might happen that we try to free an already freed pointer.

Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Chaitra P B <chaitra.basappa@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:43 +02:00
Tomas Henzl
e77b14a6f3 megaraid_sas: Add an i/o barrier
[ Upstream commit b99dbe56d5 ]

A barrier should be added to ensure proper ordering of memory mapped
writes.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:43 +02:00
Sumit Saxena
948cb1ca1f megaraid_sas: Fix SMAP issue
[ Upstream commit ea1c928bb6 ]

Inside compat IOCTL hook of driver, driver was using wrong address of
ioc->frame.raw which leads sense_ioc_ptr to be calculated wrongly and
failing IOCTL.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:43 +02:00
Sumit Saxena
f7aeba6bd2 megaraid_sas: Do not allow PCI access during OCR
[ Upstream commit 11c71cb4ab ]

This patch will do synhronization between OCR function and AEN function
using "reset_mutex" lock.  reset_mutex will be acquired only in the
first half of the AEN function which issues a DCMD. Second half of the
function which calls SCSI API (scsi_add_device/scsi_remove_device)
should be out of reset_mutex to avoid deadlock between scsi_eh thread
and driver.

During chip reset (inside OCR function), there should not be any PCI
access and AEN function (which is called in delayed context) may be
firing DCMDs (doing PCI writes) when chip reset is happening in parallel
which will cause FW fault. This patch will solve the problem by making
AEN thread and OCR thread mutually exclusive.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
2dfdfa3439 lpfc: Fix external loopback failure.
[ Upstream commit 4360ca9c24 ]

Fix external loopback failure.

Rx sequence reassembly was incorrect.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
10103c5228 lpfc: Fix mbox reuse in PLOGI completion
[ Upstream commit 01c73bbcd7 ]

Fix mbox reuse in PLOGI completion. Moved allocations so that buffer
properly init'd.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
13c61f0468 lpfc: Fix RDP Speed reporting.
[ Upstream commit 81e7517723 ]

Fix RDP Speed reporting.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
6b9bd78515 lpfc: Fix crash in fcp command completion path.
[ Upstream commit c90261dcd8 ]

Fix crash in fcp command completion path.

Missed null check.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
7cf5c223cc lpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
[ Upstream commit 6690e0d4fc ]

Fix driver crash when module parameter lpfc_fcp_io_channel set to 16

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:42 +02:00
James Smart
0217efc5b8 lpfc: Fix RegLogin failed error seen on Lancer FC during port bounce
[ Upstream commit 4b7789b71c ]

Fix RegLogin failed error seen on Lancer FC during port bounce

Fix the statemachine and ref counting.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
James Smart
1a1455583d lpfc: Fix the FLOGI discovery logic to comply with T11 standards
[ Upstream commit d6de08cc46 ]

Fix the FLOGI discovery logic to comply with T11 standards

We weren't properly setting fabric parameters, such as R_A_TOV and E_D_TOV,
when we registered the vfi object in default configs and pt2pt configs.
Revise to now pass service params with the values to the firmware and
ensure they are reset on link bounce. Required reworking the call sequence
in the discovery threads.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
James Smart
3f499a4264 lpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
[ Upstream commit f5cb5304eb ]

Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.

Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
Manoj Kumar
8c67424e2a cxlflash: Enable device id for future IBM CXL adapter
[ Upstream commit a2746fb16e ]

This drop enables a future card with a device id of 0x0600 to be
recognized by the cxlflash driver.

As per the design, the Accelerator Function Unit (AFU) for this new IBM
CXL Flash Adapter retains the same host interface as the previous
generation. For the early prototypes of the new card, the driver with
this change behaves exactly as the driver prior to this behaved with the
earlier generation card. Therefore, no card specific programming has
been added. These card specific changes can be staged in later if
needed.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
Manoj Kumar
20ab6948f1 cxlflash: Resolve oops in wait_port_offline
[ Upstream commit b45cdbaf9f ]

If an async error interrupt is generated, and the error requires the FC
link to be reset, it cannot be performed in the interrupt context. So a
work element is scheduled to complete the link reset in a process
context. If either an EEH event or an escalation occurs in between when
the interrupt is generated and the scheduled work is started, the MMIO
space may no longer be available. This will cause an oops in the worker
thread.

[  606.806583] NIP kthread_data+0x28/0x40
[  606.806633] LR wq_worker_sleeping+0x30/0x100
[  606.806694] Call Trace:
[  606.806721] 0x50 (unreliable)
[  606.806796] wq_worker_sleeping+0x30/0x100
[  606.806884] __schedule+0x69c/0x8a0
[  606.806959] schedule+0x44/0xc0
[  606.807034] do_exit+0x770/0xb90
[  606.807109] die+0x300/0x460
[  606.807185] bad_page_fault+0xd8/0x150
[  606.807259] handle_page_fault+0x2c/0x30
[  606.807338] wait_port_offline.constprop.12+0x60/0x130 [cxlflash]

To prevent the problem space area from being unmapped, when there is
pending work, a mapcount (using the kref mechanism) is held.  The
mapcount is released only when the work is completed.  The last
reference release is tied to the unmapping service.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
Manoj Kumar
284555fc7f cxlflash: Fix to resolve cmd leak after host reset
[ Upstream commit ee91e332a6 ]

After a few iterations of resetting the card, either during EEH
recovery, or a host_reset the following is seen in the logs.  cxlflash
0008:00: cxlflash_queuecommand: could not get a free command

At every reset of the card, the commands that are outstanding are being
leaked.  No effort is being made to reap these commands.  A few more
resets later, the above error message floods the logs and the card is
rendered totally unusable as no free commands are available.

Iterated through the 'cmd' queue and printed out the 'free' counter and
found that on each reset certain commands were in-use and stayed in-use
through subsequent resets.

To resolve this issue, when the card is reset, reap all the commands
that are active/outstanding.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:41 +02:00
Dan Carpenter
edf9459008 cxlflash: a couple off by one bugs
[ Upstream commit e37390bee6 ]

The "> MAX_CONTEXT" should be ">= MAX_CONTEXT".  Otherwise we go one
step beyond the end of the cfg->ctx_tbl[] array.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:39 +02:00
Yinghai Lu
bbaf719376 megaraid_sas: Fix probing cards without io port
commit e7f851684e upstream.

Found one megaraid_sas HBA probe fails,

[  187.235190] scsi host2: Avago SAS based MegaRAID driver
[  191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io  0x0000-0x00ff]
[  191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!

and the card has resource like,
[  125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[  125.104446] pci 0000:89:00.0: reg 0x10: [io  0x0000-0x00ff]
[  125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[  125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[  125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]

that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.

The driver is only looking for MEM, and should not fail.

It turns out megasas_init_fw() etc are using bar index as mask.  index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.

Fix all related reference.

Fixes: b6d5d8808b ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:43 +02:00
Greg Edwards
7386f927cf mpt3sas: Fix resume on WarpDrive flash cards
commit ce7c6c9e1d upstream.

mpt3sas crashes on resume after suspend with WarpDrive flash cards.  The
reply_post_host_index array is not set back up after the resume, and we
deference a stale pointer in _base_interrupt().

[   47.309711] BUG: unable to handle kernel paging request at ffffc90001f8006c
[   47.318289] IP: [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
[   47.326749] PGD 41ccaa067 PUD 41ccab067 PMD 3466c067 PTE 0
[   47.333848] Oops: 0002 [#1] SMP
...
[   47.452708] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #6
[   47.460506] Hardware name: Dell Inc. OptiPlex 990/06D7TR, BIOS A18 09/24/2013
[   47.469629] task: ffffffff81c0d500 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[   47.479112] RIP: 0010:[<ffffffffc00863ef>]  [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
[   47.490466] RSP: 0018:ffff88041d203e30  EFLAGS: 00010002
[   47.497801] RAX: 0000000000000001 RBX: ffff880033f4c000 RCX: 0000000000000001
[   47.506973] RDX: ffffc90001f8006c RSI: 0000000000000082 RDI: 0000000000000082
[   47.516141] RBP: ffff88041d203eb0 R08: ffff8804118e2820 R09: 0000000000000001
[   47.525300] R10: 0000000000000001 R11: 00000000100c0000 R12: 0000000000000000
[   47.534457] R13: ffff880412c487e0 R14: ffff88041a8987d8 R15: 0000000000000001
[   47.543632] FS:  0000000000000000(0000) GS:ffff88041d200000(0000) knlGS:0000000000000000
[   47.553796] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.561632] CR2: ffffc90001f8006c CR3: 0000000001c06000 CR4: 00000000000406f0
[   47.570883] Stack:
[   47.575015]  000000001d211228 ffff88041d2100c0 ffff8800c47d8130 0000000000000100
[   47.584625]  ffff8804100c0000 100c000000000000 ffff88041a8992a0 ffff88041a8987f8
[   47.594230]  ffff88041d203e00 ffffffff81111e55 000000000000038c ffff880414ad4280
[   47.603862] Call Trace:
[   47.608474]  <IRQ>
[   47.610413]  [<ffffffff81111e55>] ? call_timer_fn+0x35/0x120
[   47.620539]  [<ffffffff81100a1f>] handle_irq_event_percpu+0x7f/0x1c0
[   47.629061]  [<ffffffff81100b8c>] handle_irq_event+0x2c/0x50
[   47.636859]  [<ffffffff81103fff>] handle_edge_irq+0x6f/0x130
[   47.644654]  [<ffffffff8102fbf3>] handle_irq+0x73/0x120
[   47.652011]  [<ffffffff810c6ada>] ? atomic_notifier_call_chain+0x1a/0x20
[   47.660854]  [<ffffffff817e374b>] do_IRQ+0x4b/0xd0
[   47.667777]  [<ffffffff817e160c>] common_interrupt+0x8c/0x8c
[   47.675635]  <EOI>

Move the reply_post_host_index array setup into
mpt3sas_base_map_resources(), which is also in the resume path.

Signed-off-by: Greg Edwards <gedwards@fireweed.org>
Acked-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:43 +02:00
Dave Carroll
e4878ef66e aacraid: Check size values after double-fetch from user
commit fa00c437ee upstream.

In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07 08:32:43 +02:00
Mauricio Faria de Oliveira
74d55e5d96 lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()
commit 05a05872c8 upstream.

The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd
'lpfc_cmd->pCmd' not to be null, and point to the midlayer command.

That's not true in the .eh_(device|target|bus)_reset_handler path,
because lpfc_send_taskmgmt() sends commands not from the midlayer, so
does not set 'lpfc_cmd->pCmd'.

That is true in the .queuecommand path because lpfc_queuecommand()
stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is
stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed
to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter.

This problem can be hit on SCSI EH, and immediately with sg_reset.
These 2 test-cases demonstrate the problem/fix with next-20160601.

Test-case 1) sg_reset

    # strace sg_reset --device /dev/sdm
    <...>
    open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
    ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 <unfinished ...>
    +++ killed by SIGSEGV +++
    Segmentation fault

    # dmesg
    Unable to handle kernel paging request for data at address 0x00000000
    Faulting instruction address: 0xd00000001c88442c
    Oops: Kernel access of bad area, sig: 11 [#1]
    <...>
    CPU: 104 PID: 16333 Comm: sg_reset Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
    <...>
    NIP [d00000001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
    LR [d00000001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
    Call Trace:
    [c000003c9ec876f0] [c000003c9ec87770] 0xc000003c9ec87770 (unreliable)
    [c000003c9ec87720] [d00000001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
    [c000003c9ec87780] [d00000001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
    [c000003c9ec87880] [d00000001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
    [c000003c9ec87950] [d00000001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
    [c000003c9ec87a10] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
    [c000003c9ec87a40] [c0000000006113e8] scsi_ioctl_reset+0x198/0x2c0
    [c000003c9ec87bf0] [c00000000060fe5c] scsi_ioctl+0x13c/0x4b0
    [c000003c9ec87c80] [c0000000006629b0] sd_ioctl+0xf0/0x120
    [c000003c9ec87cd0] [c00000000046e4f8] blkdev_ioctl+0x248/0xb70
    [c000003c9ec87d30] [c0000000002a1f60] block_ioctl+0x70/0x90
    [c000003c9ec87d50] [c00000000026d334] do_vfs_ioctl+0xc4/0x890
    [c000003c9ec87de0] [c00000000026db60] SyS_ioctl+0x60/0xc0
    [c000003c9ec87e30] [c000000000009120] system_call+0x38/0x108
    Instruction dump:
    <...>

    With fix:

    # strace sg_reset --device /dev/sdm
    <...>
    open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
    ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0
    close(3)                                = 0
    exit_group(0)                           = ?
    +++ exited with 0 +++

    # dmesg
    [  424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002

Test-case 2) SCSI EH

    Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl():
    -       cmd->scsi_done(cmd);
    +       if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x32100000)
    +               printk(KERN_ALERT "lpfc: skip scsi_done()\n");
    +       else
    +               cmd->scsi_done(cmd);

    # echo 0x32100000 > /sys/class/scsi_host/host11/lpfc_log_verbose

    # dd if=/dev/sdm of=/dev/null iflag=direct &
    <...>

    After a while:

    # dmesg
    lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
    lpfc: skip scsi_done()
    <...>
    Unable to handle kernel paging request for data at address 0x00000000
    Faulting instruction address: 0xd0000000199e448c
    Oops: Kernel access of bad area, sig: 11 [#1]
    <...>
    CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
    <...>
    NIP [d0000000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
    LR [d000000019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
    Call Trace:
    [c000000ff0d0b890] [c000000ff0d0b900] 0xc000000ff0d0b900 (unreliable)
    [c000000ff0d0b8c0] [d00000001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
    [c000000ff0d0b920] [d000000019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
    [c000000ff0d0ba20] [d0000000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
    [c000000ff0d0baf0] [d0000000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
    [c000000ff0d0bbb0] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
    [c000000ff0d0bbe0] [c0000000006126cc] scsi_eh_ready_devs+0x49c/0x9c0
    [c000000ff0d0bcb0] [c000000000614160] scsi_error_handler+0x580/0x680
    [c000000ff0d0bd80] [c0000000000ae848] kthread+0x108/0x130
    [c000000ff0d0be30] [c0000000000094a8] ret_from_kernel_thread+0x5c/0xb4
    Instruction dump:
    <...>

    With fix:

    # dmesg
    lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
    lpfc: skip scsi_done()
    <...>
    lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):3172 SCSI layer issued Host Reset Data:
    <...>

Fixes: 8b0dff1416 ("lpfc: Add support for using block multi-queue")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:27 +02:00
Hannes Reinecke
5a6f9d06d8 scsi: ignore errors from scsi_dh_add_device()
commit 221255aee6 upstream.

device handler initialisation might fail due to a number of
reasons. But as device_handlers are optional this shouldn't
cause us to disable the device entirely.
So just ignore errors from scsi_dh_add_device().

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:48 +02:00
Brian King
8727178338 ipr: Clear interrupt on croc/crocodile when running with LSI
commit 54e430bbd4 upstream.

If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.

Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:29 +02:00