Commit Graph

5591 Commits

Author SHA1 Message Date
Brian King
0c5210f73e ibmvfc: Reduce error recovery timeout
commit daa142d177 upstream.

If a command times out resulting in EH getting invoked, we wait for the
aborted commands to come back after sending the abort. Shorten
the amount of time we wait for these responses, to ensure we don't
get stuck in EH for several minutes.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13 13:20:24 -07:00
Brian King
d006b64edf ibmvfc: Fix command completion handling
commit f5832fa2f8 upstream.

Commands which are completed by the VIOS are placed on a CRQ
in kernel memory for the ibmvfc driver to process. Each CRQ
entry is 16 bytes. The ibmvfc driver reads the first 8 bytes
to check if the entry is valid, then reads the next 8 bytes to get
the handle, which is a pointer the completed command. This fixes
an issue seen on Power 7 where the processor reordered the
loads from memory, resulting in processing command completion
with a stale handle. This could result in command timeouts,
and also early completion of commands.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13 13:20:24 -07:00
Hannes Reinecke
b92f44353f aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree
commit 534ef056db upstream.

When removing several devices aic79xx will occasionally Oops
in ahd_handle_nonpkt_busfree during rescan. Looking at the
code I found that we're indeed not checking if the scb in
question is NULL. So check for it before accessing it.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13 13:20:23 -07:00
Julia Lawall
de40885b3c SCSI: aacraid: Eliminate use after free
commit 8a52da632c upstream.

The debugging code using the freed structure is moved before the kfree.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@free@
expression E;
position p;
@@
kfree@p(E)

@@
expression free.E, subE<=free.E, E1;
position free.p;
@@

  kfree@p(E)
  ...
(
  subE = E1
|
* E
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-08-02 10:20:50 -07:00
Ben Hutchings
966399a8b8 qla2xxx: Disable MSI on qla24xx chips other than QLA2432.
commit 6377a7ae1a upstream.

On specific platforms, MSI is unreliable on some of the QLA24xx chips, resulting
in fatal I/O errors under load, as reported in <http://bugs.debian.org/572322>
and by some RHEL customers.

Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05 11:11:21 -07:00
Tomas Henzl
9e79d5307f megaraid_sas: fix for 32bit apps
commit b3dc1a212e upstream.

It looks like this patch -

commit 7b2519afa1
Author: Yang, Bo <Bo.Yang@lsi.com>
Date:   Tue Oct 6 14:52:20 2009 -0600

    [SCSI] megaraid_sas: fix 64 bit sense pointer truncation

has caused a problem for 32bit programs with 64bit os -

http://bugzilla.kernel.org/show_bug.cgi?id=15001

fix by converting the user space 32bit pointer to a 64 bit one when
needed.

[jejb: fix up some 64 bit warnings]
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Cc: Bo Yang <Bo.Yang@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26 14:29:20 -07:00
James Bottomley
7443d2d252 SCSI: Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error
commit 77a4229719 upstream.

There's nastyness in the way we currently handle barriers (and
discards): They're effectively filesystem commands, but they get
processed as BLOCK_PC commands.  Unfortunately BLOCK_PC commands are
taken by SCSI to be SG_IO commands and the issuer expects to see and
handle any returned errors, however trivial.  This leads to a huge
problem, because the block layer doesn't expect this to happen and any
trivially retryable error on a barrier causes an immediate I/O error
to the filesystem.

The only real way to hack around this is to take the usual class of
offending errors (unit attentions) and make them all retryable in the
case of a REQ_HARDBARRIER.  A correct fix would involve a rework of
the entire block and SCSI submit system, and so is out of scope for a
quick fix.

Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:17 -07:00
Hannes Reinecke
f3dc6becaa Enable retries for SYNCRONIZE_CACHE commands to fix I/O error
commit c213e1407b upstream.

Some arrays are giving I/O errors with ext3 filesystems when
SYNCHRONIZE_CACHE gets a UNIT_ATTENTION.  What is happening is that
these commands have no retries, so the UNIT_ATTENTION causes the
barrier to fail.  We should be enable retries here to clear any
transient error and allow the barrier to succeed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:17 -07:00
Douglas Gilbert
cc9a23629f scsi_debug: virtual_gb ignores sector_size
commit 5447ed6c96 upstream.

In the scsi_debug driver, the virtual_gb option ignores the
sector_size, implicitly assuming that is 512 bytes.  So if
'virtual_gb=1 sector_size=4096' the result is an 8 GB (virtual) disk.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:16 -07:00
Mike Christie
ec5e50f079 SCSI: libiscsi: regression: fix header digest errors
commit 96b1f96dca upstream.

This fixes a regression introduced with this commit:

commit d3305f3407
Author: Mike Christie <michaelc@cs.wisc.edu>
Date:   Thu Aug 20 15:10:58 2009 -0500

    [SCSI] libiscsi: don't increment cmdsn if cmd is not sent

in 2.6.32.

When I moved the hdr->cmdsn after init_task, I added
a bug when header digests are used. The problem is
that the LLD may calculate the header digest in init_task,
so if we then set the cmdsn after the init_task call we
change what the digest will be calculated by the target.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:16 -07:00
Tejun Heo
a414b72407 SCSI: fix locking around blk_abort_request()
commit 70b25f890c upstream.

blk_abort_request() expects queue lock to be held by the caller.
Grab it before calling the function.

Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:16 -07:00
Lalit Chandivade
b2729e2d21 qla2xxx: Properly handle UNDERRUN completion statuses.
commit 0f00a206cc upstream.

Correct issues where the lower scsi-status would be improperly
cleared, instead, allow the midlayer to process the status after
the proper residual-count checks are performed.  Finally,
validate firmware status flags prior to assigning values from the
FCP_RSP frame.

Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Michael Hernandez <michael.hernandez@qlogic.com>
Signed-off-by: Ravi Anand <ravi.anand@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:14 -07:00
Douglas Gilbert
6f12efee9e skip sense logging for some ATA PASS-THROUGH cdbs
commit e7efe5932b upstream.

Further to the lsml thread titled:
"does scsi_io_completion need to dump sense data for ata pass through (ck_cond =
1) ?"

This is a patch to skip logging when the sense data is
associated with a SENSE_KEY of "RECOVERED_ERROR" and the
additional sense code is "ATA PASS-THROUGH INFORMATION
AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH
commands when CK_COND=1 (in the cdb). It indicates that
the sense data contains ATA registers.

Smartmontools uses such commands on ATA disks connected via
SAT. Periodic checks such as those done by smartd cause
nuisance entries into logs that are:
    - neither errors nor warnings
    - pointless unless the cdb that caused them are also logged

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12 14:57:13 -07:00
Mike Christie
f435966fc5 SCSI: add scsi target reset support to scsi ioctl
commit 3f9daedfcb upstream.

The scsi ioctl code path was missing scsi target reset
support. This patch just adds it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:30 -07:00
Mike Christie
285f4f1123 fc class: fail fast bsg requests
commit 2bc1c59dbd upstream.

If the port state is blocked and the fast io fail tmo has
fired then this patch will fail bsg requests immediately.
This is needed if userspace is sending IOs to test the transport
like with fcping, so it will not have to wait for the dev loss tmo.
With this patch he bsg req fast io fail code behaves like the normal
and sg io/passthrough fast io fail.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-By: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:29 -07:00
Mike Christie
d3ce482f41 libiscsi: Fix recovery slowdown regression
commit 4ae0a6c15e upstream.

We could be failing/stopping a connection due to libiscsi starting
recovery/cleanup, but the xmit path or scsi eh thread path
could be dropping the connection at the same time.

As a result the session->state gets set to failed instead of in
recovery. We end up not blocking the session
and so the replacement timeout never gets started and we only end up
failing the IO when scsi_softirq_done sees that the
cmd has been running for (cmd->allowed + 1) * rq->timeout secs.

We used to fail the IO right away so users are seeing a long
delay when using dm-multipath. This problem was added in
2.6.28.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Gal Rosen
81bcb49360 SCSI: scsi_transport_fc: Fix synchronization issue while deleting vport
commit 0d9dc7c8b9 upstream.

The issue occur while deleting 60 virtual ports through the sys
interface /sys/class/fc_vports/vport-X/vport_delete. It happen while in
a mistake each request sent twice for the same vport. This interface is
asynchronous, entering the delete request into a work queue, allowing
more than one request to enter to the delete work queue. The result is a
NULL pointer. The first request already delete the vport, while the
second request got a pointer to the vport before the device destroyed.
Re-create vport later cause system freeze.

Solution: Check vport flags before entering the request to the work queue.

[jejb: fixed int<->long problem on spinlock flags variable]
Signed-off-by: Gal Rosen <galr@storwize.com>
Acked-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01 15:58:28 -07:00
Srinivas
f9404808f4 mvsas: add support for Adaptec ASC-1045/1405 SAS/SATA HBA
commit 7ec4ad0125 upstream.

This is support for Adaptec ASC-1045/1405 SAS/SATA HBA on mvsas, which
is based on Marvell 88SE6440 chipset.

Signed-off-by: Srinivas <satyasrinivasp@hcl.in>
Cc: Andy Yan <ayan@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01 15:58:15 -07:00
Julia Lawall
b4c8997c1f drivers/scsi/ses.c: eliminate double free
commit 9b3a6549b2 upstream.

The few lines below the kfree of hdr_buf may go to the label err_free
which will also free hdr_buf.  The most straightforward solution seems to
be to just move the kfree of hdr_buf after these gotos.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier E;
expression E1;
iterator I;
statement S;
@@

*kfree(E);
... when != E = E1
    when != I(E,...) S
    when != &E
*kfree(E);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01 15:58:02 -07:00
Ben Hutchings
d236c04008 SCSI: qla1280: Drop host_lock while requesting firmware
commit 2cec802980 upstream.

request_firmware() may sleep and it appears to be safe to release the
spinlock here.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:44 -07:00
Anirban Chakraborty
154d9b4d57 SCSI: qla2xxx: Obtain proper host structure during response-queue processing.
commit a67093d46e upstream.

Original code incorrectly assumed only status-type-0
IOCBs would be queued to the response-queue, and thus all
entries would safely reference a VHA from the IOCB
'handle.'

Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:40 -07:00
Kashyap, Desai
7c0f2aedfa mpt2sas: Delete volume before HBA detach.
commit d7384b28af upstream.

The driver hangs when doing `rmmod mpt2sas` if there are any
IR volumes present.The hang is due the scsi midlayer trying to access the
IR volumes after the driver releases controller resources.  Perhaps when
scsi_remove_host is called,the scsi mid layer is sending some request.
This doesn't occur for bare drives becuase the driver is already reporting
those drives deleted prior to calling mpt2sas_base_detach.
To solve this issue, we need to delete the volumes as well.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Reviewed-by: Eric Moore <eric.moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:36 -07:00
Guennadi Liakhovetski
487d83f91b ARM: 5944/1: scsi: fix timer setup in fas216.c
commit b857df1acc upstream.

mod_timer() takes an absolute time and not a delay as its argument.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:32 -07:00
Boaz Harrosh
e15fca01ba scsi_lib: Fix bug in completion of bidi commands
commit 63c43b0ec1 upstream.

Because of the terrible structuring of scsi-bidi-commands
it breaks some of the life time rules of a scsi-command.
It is now not allowed to free up the block-request before
cleanup and partial deallocation of the scsi-command. (Which
is not so for none bidi commands)

The right fix to this problem would be to make bidi command
a first citizen by allocating a scsi_sdb pointer at scsi command
just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated
as part of the get/put_command (Again like the prot_sdb) and the
current decoupling of scsi_cmnd and blk-request should be kept.

For now make sure scsi_release_buffers() is called before the
call to blk_end_request_all() which might cause the suicide of
the block requests. At best the leak of bidi buffers, at worse
a crash, as there is a race between the existence of the bidi_request
and the free of the associated bidi_sdb.

The reason this was never hit before is because only OSD has the potential
of doing asynchronous bidi commands. (So does bsg but it is never used)
And OSD clients just happen to do all their bidi commands synchronously, up
until recently.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:50:41 -08:00
Yi Zou
7c0798e289 fcoe: Fix getting san mac for VLAN interface
commit 5bab87e6d4 upstream.

Make sure we are get the SAN MAC address from the real netdev if the input
netdev is a VLAN device.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:42 -08:00
Yi Zou
1ce03481ed fcoe: Fix checking san mac address
commit bf361707c8 upstream.

This was fixed before in 7a7f0c7 but it's introduced again recently.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:41 -08:00
Vasu Dev
e166cb138d fcoe, libfc: fix an libfc issue with queue ramp down in libfc
commit 14caf44c69 upstream.

The cmd_per_lun value is used by scsi-ml as fall back lowest
queue_depth value but in case of libfc cmd_per_lun is set to
same value as max queue_depth = 32.

So this patch reduces cmd_per_lun value to 3 and configures
each lun with default max queue_depth 32 in fc_slave_alloc.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Acked-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:40 -08:00
Abhijeet Joglekar
2792e0ceb2 libfc: remote port gets stuck in restart state without really restarting
commit 5543c72e2b upstream.

We ran into a scenario where a remote port goes into RESTART state, but
never gets added to scsi transport. The running vmcore showed the following:
a) Port was in RESTART state
b) rdata->event was STOP
c) no work gets scheduled for the remote work to fc_rport_work

After this point, shut/no-shut of the remote port did not cause the port
to get re-discovered. The port would move betwen DELETE and RESTART states,
but the event would always be STOP, no work would get scheduled to
fc_rport_work and the port would not get added to scsi_transport.

The problem is that rdata->event is not set to NONE after a port is
restarted. After this point, no more work gets scheduled for the remote port
since new work is scheduled only if rdata->event is non-NONE. So, the event
and state keep changing, but fc_rport_work does not get scheduled to actually
handle the event.

Here's a transition of states that explains the above observation:

) Port is first in READY State, event is NONE

2) RSCN on shut, port goes to DELETED, event is stop

3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is
still STOP

4) fc_rport_work gets scheduled, removes the port from transport, sees state
as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT
changed to NONE, this is the bug)

5) Plogi state machine completes, port state goes to READY, event goes to
READY, but no work is scheduled since event was STOP (non-NONE) before.
Fc_rport_work is not scheduled, port remains in READY state, but is not added
to transport.

Things are broken at this point. Libfc rport is ready, but no transport rport
created.

6) now a shut causes port state to change to DELETE, event to change to STOP,
no work gets scheduled

7) no-shut causes port state to change to RESTART, event remains at STOP,
no work gets scheduled

(6) and (7) now get repeated everytime we do shut/no-shut. No way to get out
of this state. Fcc reset does not help too.

Only way to get out is to load/unload module.

Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED
events, inside the discovery and rport locks.

Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:39 -08:00
Joe Eykholt
407590ad18 libfc: fix free of fc_rport_priv with timer pending
commit b4a9c7ede9 upstream.

Timer crashes were caused by freeing a struct fc_rport_priv
with a timer pending, causing the timer facility list to be
corrupted.  This was during FC uplink flap tests with a lot
of targets.

After discovery, we were doing an PLOGI on an rdata that was
in DELETE state but not yet removed from the lookup list.
This moved the rdata from DELETE state to PLOGI state.
If the PLOGI exchange allocation failed and needed to be
retried, the timer scheduling could race with the free
being done by fc_rport_work().

When fc_rport_login() is called on a rport in DELETE state,
move it to a new state RESTART.  In fc_rport_work, when
handling a LOGO, STOPPED or FAILED event, look for restart
state.  In the RESTART case, don't take the rdata off the
list and after the transport remote port is deleted and
exchanges are reset, re-login to the remote port.

Note that the new RESTART state also corrects a problem we
had when re-discovering a port that had moved to DELETE state.
In that case, a new rdata was created, but the old rdata
would do an exchange manager reset affecting the FC_ID
for both the new rdata and old rdata.  With the new state,
the new port isn't logged into until after any old exchanges
are reset.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:38 -08:00
Chris Leech
a3d46ca93e libfc: fix memory corruption caused by double frees and bad error handling
commit 8f550f937e upstream.

I was running into several different panics under stress, which I traced down
to a few different possible slab corruption issues in error handling paths.
I have not yet looked into why these exchange sends fail, but with these
fixes my test system is much more stable under stress than before.

fc_elsct_send() could fail and either leave the passed in frame intact
(failure in fc_ct/els_fill) or the frame could have been freed if the
failure was is fc_exch_seq_send().  The caller had no way of knowing, and
there was a potential double free in the error handling in fc_fcp_rec().

Make fc_elsct_send() always free the frame before returning, and remove the
fc_frame_free() call in fc_fcp_rec().

While fc_exch_seq_send() did always consume the frame, there were double free
bugs in the error handling of fc_fcp_cmd_send() and fc_fcp_srr() as well.

Numerous calls to error handling routines (fc_disc_error(),
fc_lport_error(), fc_rport_error_retry() ) were passing in a frame pointer that
had already been freed in the case of an error.  I have changed the call
sites to pass in a NULL pointer, but there may be more appropriate error
codes to use.

Question:  Why do these error routines take a frame pointer anyway?  I
understand passing in a pointer encoded error to the response handlers, but
the error routines take no action on a valid pointer and should never be
called that way.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:37 -08:00
Yi Zou
4c40dbe524 libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_data
commit d37322a43e upstream.

In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info()
called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[],
exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the
FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc
page. And send will be broken into multiple large sends if the frame already
contains more frags than skb handle.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:36 -08:00
Mike Christie
88cc93a114 fcoe: initialize return value in fcoe_destroy
commit 8eca355fa8 upstream.

When doing echo ethX > /sys..../destroy I am getting
errors when the tear down succeeds. It looks like the
reason for this is because the rc var is not getting set
when the destruction works. This just sets it to zero.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:34 -08:00
Joe Eykholt
7c8a0dc2b0 libfc: don't WARN_ON in lport_timeout for RESET state
commit 22655ac222 upstream.

It's possible and harmless to get FLOGI timeouts
while in RESET state.  Don't do a WARN_ON in that case.

Also, split out the other WARN_ONs in fc_lport_timeout, so
we can tell which one is hit by its line number.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:33 -08:00
Joe Eykholt
83d236b727 libfc: lport: fix minor documentation errors
commit 1b69bc062c upstream.

Fix minor errors.
A debug message said an RLIR was received instead of ECHO.
"Expected" was misspelled in several places.
Fix a type cast from u32 to __be32.

Rob, Some of these may have been also taken care of in your
other doc cleanup patch.  Feel free to fold them in.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:32 -08:00
Yi Zou
56320f6e40 libfc: Fix wrong scsi return status under FC_DATA_UNDRUN
commit 4347fa6687 upstream.

This bug is exposed when there is a link flap in LLD. Particularly, when it
happens right after a SCSI write command is sent out, no FCP_DATA is sent,
causing fsp->status_code to be set as FC_DATA_UNDRUN in fc_fcp_complete_locked
even no SCSI status is received. Consequently, fc_io_compl treats this as DID_OK.
This results in SCSI returning successful to the initial I/O request even
there is no DATA actually sent. Particularly, if you run an I/O tool w/ data
verification on, the read back for verification is gonna fail.

This is fixed here by checking when FC_DATA_UNDRUN happens, SCSI status is
received w/ FC_SRB_RCV_STATUS set in fsp->state.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:31 -08:00
Yi Zou
d5d72dafd7 fcoe: remove redundant checking of netdev->netdev_ops
commit b04d023cf5 upstream.

Remove the redundant checking of netdev->netdev_ops as it will never be NULL.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:30 -08:00
Yi Zou
34556a1805 libfc: fix ddp in fc_fcp for 0 xid
commit 5e472d077f upstream.

xid 0 was used as an indication of invalid xid before but now xid 0
can be used as a valid exchange i. This patch fixes the ddp completion
in fcp layer, i.e., in fc_fcp.c:fc_fcp_ddp_done() function, to make sure it
does not use xid 0 for indication of an invalid xid, instead, it now
uses use FC_XID_UNKNOWN for such indication.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:26 -08:00
Joe Eykholt
1e418b2888 libfc: fix typo in retry check on received PRLI
commit 85b5893ca9 upstream.

A received Fibre Channel ELS PRLI request contains a bit that
indicates whether the remote port supports certain retry processing
sequences.  The test for this bit was somehow coded to use multiply
instead of AND!

This case would apply only for target mode operation, and it is
unlikely to be noticed as an initiator.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:25 -08:00
Michael Reed
253f41bf6e lpfc: fix hang on SGI ia64 platform
commit 8e68597d08 upstream.

In testing 2.6.31 on one of our ia64 platforms I've encountered a hang
due to the driver using hardware ATEs which are a limited resource.
This is because the driver does not set the dma consistent mask to
64 bits.

Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: James Smart <James.Smart@Emulex.Com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:24 -08:00
Michael Reed
4b2bc96c3a scsi_transport_fc: remove invalid BUG_ON
commit 8798a694da upstream.

I was doing some large lun count testing with 2.6.31 and hit
a BUG_ON() in fc_timeout_deleted_rport(), and it seems like it
should have been just a matter of time before someone did.

It seems invalid to set port_state under lock, then expect it to
remain set after releasing the lock.  Another thread called
fc_remote_port_add() when the lock was released, changing the
port_state.

This patch removes the BUG_ON and moves the test of the
port_state to inside the host_lock.  It's been running for
several weeks now with no ill effect.

Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by:  James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:23 -08:00
Chandra Seetharaman
d502a76693 scsi_dh: create sysfs file, dh_state for all SCSI disk devices
commit 5917290ce9 upstream.

Create the sysfs file, dh_state even if the new SCSI device is not
in the any of the device handler's internal lists.

Signed-Off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:22 -08:00
Takahiro Yasui
e7c8167e4e scsi_devinfo: update Hitachi entries (v2)
commit 627511e3e6 upstream.

Four models, OPEN-/DF400/DF500/DISK-SUBSYSTEM, can handle REPORT_LUN,
and the BLIST_REPORTLUN2 flag needs to be set. And DF600 doesn't require
any flags because it returns ANSI 03h (SPC).

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:21 -08:00
Mike Christie
1d42a1bd93 iscsi class: modify handling of replacement timeout
commit fdd46dcbe4 upstream.

This patch modifies the replacement/recovery_timeout so it works
more like the fc fast io fail tmo.

If userspace tries to set the replacement/recovery_timeout to less than
zero, we will turn off the forced recovery cleanup.

If userspace sets the value to 0 then we will force the recovery
cleanup immediately.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:17 -08:00
Kashyap, Desai
002464c610 mpt2sas: New device SAS2208 support is added
commit db27136a89 upstream.

Added device ids range for { 0x80 - 87 } , modified mpi/mpi2_cnfg.h containing
MPI2_MFGPAGE_DEVID_SAS2208_X.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: Eric Moore <Eric.moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:00:36 -08:00
Bryn M. Reeves
94249e6037 megaraid_sas: remove sysfs poll_mode_io world writeable permissions
commit bb7d3f24c7 upstream.

/sys/bus/pci/drivers/megaraid_sas/poll_mode_io defaults to being
world-writable, which seems bad (letting any user affect kernel driver
behavior).

This turns off group and user write permissions, so that on typical
production systems only root can write to it.

Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:27 -08:00
Mike Christie
fdf2675111 SCSI: fc class: fix fc_transport_init error handling
commit 48de68a40a upstream.

If transport_class_register fails we should unregister any
registered classes, or we will leak memory or other
resources.

I did a quick modprobe of scsi_transport_fc to test the
patch.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:14 -08:00
FUJITA Tomonori
1ab0714daa SCSI: st: fix mdata->page_order handling
commit c982c368bb upstream.

dio transfer always resets mdata->page_order to zero. It breaks
high-order pages previously allocated for non-dio transfer.

This patches adds reserved_page_order to st_buffer structure to save
page order for non-dio transfer.

http://bugzilla.kernel.org/show_bug.cgi?id=14563

When enlarge_buffer() allocates 524288 from 0, st uses six-order page
allocation. So mdata->page_order is 6 and frp_seg is 2.

After that, if st uses dio, sgl_map_user_pages() sets
mdata->page_order to 0 for st_do_scsi(). After that, when we call
normalize_buffer(), it frees only free frp_seg * PAGE_SIZE (2 * 4096)
though we should free frp_seg * PAGE_SIZE << 6 (2 * 4096 << 6). So we
see buffer_size is set to 516096 (524288 - 8192).

Reported-by: Joachim Breuer <linux-kernel@jmbreuer.net>
Tested-by: Joachim Breuer <linux-kernel@jmbreuer.net>
Acked-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:13 -08:00
Michael Reed
9f63d27c1b SCSI: qla2xxx: dpc thread can execute before scsi host has been added
commit 1486400f7e upstream.

Fix crash in qla2x00_fdmi_register() due to the dpc
thread executing before the scsi host has been fully
added.

Unable to handle kernel NULL pointer dereference (address 00000000000001d0)
qla2xxx_7_dpc[4140]: Oops 8813272891392 [1]

Call Trace:
 [<a000000100016910>] show_stack+0x50/0xa0
                                sp=e00000b07c59f930 bsp=e00000b07c591400
 [<a000000100017180>] show_regs+0x820/0x860
                                sp=e00000b07c59fb00 bsp=e00000b07c5913a0
 [<a00000010003bd60>] die+0x1a0/0x2e0
                                sp=e00000b07c59fb00 bsp=e00000b07c591360
 [<a0000001000681a0>] ia64_do_page_fault+0x8c0/0x9e0
                                sp=e00000b07c59fb00 bsp=e00000b07c591310
 [<a00000010000c8e0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e00000b07c59fb90 bsp=e00000b07c591310
 [<a000000207197350>] qla2x00_fdmi_register+0x850/0xbe0 [qla2xxx]
                                sp=e00000b07c59fd60 bsp=e00000b07c591290
 [<a000000207171570>] qla2x00_configure_loop+0x1930/0x34c0 [qla2xxx]
                                sp=e00000b07c59fd60 bsp=e00000b07c591128
 [<a0000002071732b0>] qla2x00_loop_resync+0x1b0/0x2e0 [qla2xxx]
                                sp=e00000b07c59fdf0 bsp=e00000b07c5910c0
 [<a000000207166d40>] qla2x00_do_dpc+0x9a0/0xce0 [qla2xxx]
                                sp=e00000b07c59fdf0 bsp=e00000b07c590fa0
 [<a0000001000d5bb0>] kthread+0x110/0x140
                                sp=e00000b07c59fe00 bsp=e00000b07c590f68
 [<a000000100014a30>] kernel_thread_helper+0xd0/0x100
                                sp=e00000b07c59fe30 bsp=e00000b07c590f40
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e00000b07c59fe30 bsp=e00000b07c590f40

crash> dis a000000207197350
0xa000000207197350 <qla2x00_fdmi_register+2128>:        [MMI]       ld1 r45=[r14];;
crash> scsi_qla_host.host 0xe00000b058c73ff8
  host = 0xe00000b058c73be0,
crash> Scsi_Host.shost_data 0xe00000b058c73be0
  shost_data = 0x0,  <<<<<<<<<<<

The fc_transport fc_* workqueue threads have yet to be created.

crash> ps | grep _7
   3891      2   2  e00000b075c80000  IN   0.0       0      0  [scsi_eh_7]
   4140      2   3  e00000b07c590000  RU   0.0       0      0  [qla2xxx_7_dpc]

The thread creating adding the Scsi_Host is blocked due to other
activity in sysfs.

crash> bt 3762
PID: 3762   TASK: e00000b071e70000  CPU: 3   COMMAND: "modprobe"
 #0 [BSP:e00000b071e71548] schedule at a000000100727e00
 #1 [BSP:e00000b071e714c8] __mutex_lock_slowpath at a0000001007295a0
 #2 [BSP:e00000b071e714a8] mutex_lock at a000000100729830
 #3 [BSP:e00000b071e71478] sysfs_addrm_start at a0000001002584f0
 #4 [BSP:e00000b071e71440] create_dir at a000000100259350
 #5 [BSP:e00000b071e71410] sysfs_create_subdir at a000000100259510
 #6 [BSP:e00000b071e713b0] internal_create_group at a00000010025c880
 #7 [BSP:e00000b071e71388] sysfs_create_group at a00000010025cc50
 #8 [BSP:e00000b071e71368] dpm_sysfs_add at a000000100425050
 #9 [BSP:e00000b071e71310] device_add at a000000100417d90
#10 [BSP:e00000b071e712d8] scsi_add_host at a00000010045a380
#11 [BSP:e00000b071e71268] qla2x00_probe_one at a0000002071be950
#12 [BSP:e00000b071e71248] local_pci_probe at a00000010032e490
#13 [BSP:e00000b071e71218] pci_device_probe at a00000010032ecd0
#14 [BSP:e00000b071e711d8] driver_probe_device at a00000010041d480
#15 [BSP:e00000b071e711a8] __driver_attach at a00000010041d6e0
#16 [BSP:e00000b071e71170] bus_for_each_dev at a00000010041c240
#17 [BSP:e00000b071e71150] driver_attach at a00000010041d0a0
#18 [BSP:e00000b071e71108] bus_add_driver at a00000010041b080
#19 [BSP:e00000b071e710c0] driver_register at a00000010041dea0
#20 [BSP:e00000b071e71088] __pci_register_driver at a00000010032f610
#21 [BSP:e00000b071e71058] (unknown) at a000000207200270
#22 [BSP:e00000b071e71018] do_one_initcall at a00000010000a9c0
#23 [BSP:e00000b071e70f98] sys_init_module at a0000001000fef00
#24 [BSP:e00000b071e70f98] ia64_ret_from_syscall at a00000010000c740

So, it appears that qla2xxx dpc thread is moving forward before the
scsi host has been completely added.

This patch moves the setting of the init_done (and online) flag to
after the call to scsi_add_host() to hold off the dpc thread.

Found via large lun count testing using 2.6.31.

Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:12 -08:00
Kleber Sacilotto de Souza
c1d17da3cf SCSI: ipr: fix EEH recovery
commit 99c965dd9e upstream.

After commits c82f63e411 (PCI: check saved
state before restore) and 4b77b0a2ba (PCI:
Clear saved_state after the state has been restored) PCI drivers are
prevented from restoring the device standard configuration registers
twice in a row. These changes introduced a regression on ipr EEH
recovery.

The ipr device driver saves the PCI state only during the device probe
and restores it on ipr_reset_restore_cfg_space() during IOA resets. This
behavior is causing the EEH recovery to fail after the second error
detected, since the registers are not being restored.

One possible solution would be saving the registers after restoring
them. The problem with this approach is that while recovering from an
EEH error if pci_save_state() results in an EEH error, the adapter/slot
will be reset, and end up back in ipr_reset_restore_cfg_space(), but it
won't have a valid saved state to restore, so pci_restore_state() will
fail.

The following patch introduces a workaround for this problem, hacking
around the PCI API by setting pdev->state_saved = true before we do the
restore. It fixes the EEH regression and prevents that we hit another
EEH error during EEH recovery.


[jejb: fix is a hack ... Jesse and Rafael will fix properly]
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:03:11 -08:00
Yang, Bo
d10a8f0520 SCSI: megaraid_sas: fix 64 bit sense pointer truncation
commit 7b2519afa1 upstream.

The current sense pointer is cast to a u32 pointer, which can truncate
on 64 bits.  Fix by using unsigned long instead.

Signed-off-by Bo Yang<bo.yang@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:46 -08:00