linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-19 12:00:46 +09:00

Author	SHA1	Message	Date
David Jeffery	78c9672f78	SCSI: qla2xxx: Test and clear FCPORT_UPDATE_NEEDED atomically. commit `a394aac885` upstream. When the qla2xxx driver loses access to multiple, remote ports, there is a race condition which can occur which will keep the request stuck on a scsi request queue indefinitely. This bad state occurred do to a race condition with how the FCPORT_UPDATE_NEEDED bit is set in qla2x00_schedule_rport_del(), and how it is cleared in qla2x00_do_dpc(). The problem port has its drport pointer set, but it has never been processed by the driver to inform the fc transport that the port has been lost. qla2x00_schedule_rport_del() sets drport, and then sets the FCPORT_UPDATE_NEEDED bit. In qla2x00_do_dpc(), the port lists are walked and any drport pointer is handled and the fc transport informed of the port loss, then the FCPORT_UPDATE_NEEDED bit is cleared. This leaves a race where the dpc thread is processing one port removal, another port removal is marked with a call to qla2x00_schedule_rport_del(), and the dpc thread clears the bit for both removals, even though only the first removal was actually handled. Until another event occurs to set FCPORT_UPDATE_NEEDED, the later port removal is never finished and qla2xxx stays in a bad state which causes requests to become stuck on request queues. This patch updates the driver to test and clear FCPORT_UPDATE_NEEDED atomically. This ensures the port state changes are processed and not lost. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 08:43:54 -08:00
Xi Wang	b9ebaf5aad	SCSI: mvsas: fix undefined bit shift commit `beecadea1b` upstream. The macro bit(n) is defined as ((u32)1 << n), and thus it doesn't work with n >= 32, such as in mvs_94xx_assign_reg_set(): if (i >= 32) { mvi->sata_reg_set \|= bit(i); ... } The shift ((u32)1 << n) with n >= 32 also leads to undefined behavior. The result varies depending on the architecture. This patch changes bit(n) to do a 64-bit shift. It also simplifies mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined. Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 08:43:54 -08:00
Maciej Patelczyk	14de2f67dd	SCSI: isci: copy fis 0x34 response into proper buffer commit `49bd665c54` upstream. SATA MICROCODE DOWNALOAD fails on isci driver. After receiving Register Device to Host (FIS 0x34) frame Initiator resets phy. In the frame handler routine response (FIS 0x34) was copied into wrong buffer and upper layer did not receive any answer which resulted in timeout and reset. This patch corrects this bug. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-12-03 12:59:13 -08:00
Martin Michlmayr	fe77d1bb93	drivers/scsi/atp870u.c: fix bad use of udelay commit `0f6d93aa9d` upstream. The ACARD driver calls udelay() with a value > 2000, which leads to to the following compilation error on ARM: ERROR: "__bad_udelay" [drivers/scsi/atp870u.ko] undefined! make[1]: *** [__modpost] Error 1 This is because udelay is defined on ARM, roughly speaking, as #define udelay(n) ((n) > 2000 ? __bad_udelay() : \ __const_udelay((n) * ((2199023U*HZ)>>11))) The argument to __const_udelay is the number of jiffies to wait divided by 4, but this does not work unless the multiplication does not overflow, and that is what the build error is designed to prevent. The intended behavior can be achieved by using mdelay to call udelay multiple times in a loop. [jrnieder@gmail.com: adding context] Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-13 05:28:03 +09:00
Bart Van Assche	d71df5421f	SCSI: scsi_dh_alua: Enable STPG for unavailable ports commit `e47f8976d8` upstream. A quote from SPC-4: "While in the unavailable primary target port asymmetric access state, the device server shall support those of the following commands that it supports while in the active/optimized state: [ ... ] d) SET TARGET PORT GROUPS; [ ... ]". Hence enable sending STPG to a target port group that is in the unavailable state. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-07 08:27:27 -07:00
Dan Williams	8fda07927a	SCSI: scsi_remove_target: fix softlockup regression on hot remove commit `bc3f02a795` upstream. John reports: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] [..] Call Trace: [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0 [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60 [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20 [<ffffffff81421e35>] sas_port_delete+0x25/0x160 [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270 ...introduced by commit `3b661a9` "[SCSI] fix hot unplug vs async scan race". Don't restart lookup of more stargets in the multi-target case, just arrange to traverse the list once, on the assumption that new targets are always added at the end. There is no guarantee that the target will change state in scsi_target_reap() so we can end up spinning if we restart. Acked-by: Jack Wang <jack_wang@usish.com> LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com> Reported-by: John Drescher <drescherjm@gmail.com> Tested-by: John Drescher <drescherjm@gmail.com> Signed-off-by: Dan Williams <djbw@fb.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-07 08:27:27 -07:00
Dan Williams	42cc576bf2	isci: fix isci_pci_probe() generates warning on efi failure path commit `6d70a74ffd` upstream. The oem parameter image embedded in the efi variable is at an offset from the start of the variable. However, in the failure path we try to free the 'orom' pointer which is only valid when the paramaters are being read from the legacy option-rom space. Since failure to load the oem parameters is unlikely and we keep the memory around in the success case just defer all de-allocation to devm. Reported-by: Don Morris <don.morris@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-07 08:27:27 -07:00
Stephen M. Cameron	c07ad5e868	SCSI: hpsa: Use LUN reset instead of target reset commit `21e89afd32` upstream. It turns out Smart Array logical drives do not support target reset and when the target reset fails, the logical drive will be taken off line. Symptoms look like this: hpsa 0000:03:00.0: Abort request on C1:B0:T0:L0 hpsa 0000:03:00.0: resetting device 1:0:0:0 hpsa 0000:03:00.0: cp ffff880037c56000 is reported invalid (probably means target device no longer present) hpsa 0000:03:00.0: resetting device failed. sd 1:0:0:0: Device offlined - not ready after error recovery sd 1:0:0:0: rejecting I/O to offline device EXT3-fs error (device sdb1): read_block_bitmap: LUN reset is supported though, and is what we should be using. Target reset is also disruptive in shared SAS situations, for example, an external MSA1210m which does support target reset attached to Smart Arrays in multiple hosts -- a target reset from one host is disruptive to other hosts as all LUNs on the target will be reset and will abort all outstanding i/os back to all the attached hosts. So we should use LUN reset, not target reset. Tested this with Smart Array logical drives and with tape drives. Not sure how this bug survived since 2009, except it must be very rare for a Smart Array to require more than 30s to complete a request. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-07 08:27:26 -07:00
Benjamin Herrenschmidt	a3b1f83195	SCSI: ibmvscsi: Fix host config length field overflow commit `225c56960f` upstream. The length field in the host config packet is only 16-bit long, so passing it 0x10000 (64K which is our standard PAGE_SIZE) doesn't work and result in an empty config from the server. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-07 08:27:26 -07:00
Stephen M. Cameron	9d1abc4c1f	SCSI: hpsa: fix handling of protocol error commit `256d0eaac8` upstream. If a command status of CMD_PROTOCOL_ERR is received, this information should be conveyed to the SCSI mid layer, not dropped on the floor. CMD_PROTOCOL_ERR may be received from the Smart Array for any commands destined for an external RAID controller such as a P2000, or commands destined for tape drives or CD/DVD-ROM drives, if for instance a cable is disconnected. This mostly affects multipath configurations, as disconnecting a cable on a non-multipath configuration is not going to do anything good regardless of whether CMD_PROTOCOL_ERR is handled correctly or not. Not handling CMD_PROTOCOL_ERR correctly in a multipath configaration involving external RAID controllers may cause data corruption, so this is quite a serious bug. This bug should not normally cause a problem for direct attached disk storage. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-02 09:47:27 -07:00
Eddie Wai	880d7a7bbe	SCSI: bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload commit `d653220711` upstream. This patch fixes the following kernel panic invoked by uninitialized fields in the chip initialization for the 1G bnx2 iSCSI offload. One of the bits in the chip initialization is being used by the latest firmware to control overflow packets. When this control bit gets enabled erroneously, it would ultimately result in a bad packet placement which would cause the bnx2 driver to dereference a NULL ptr in the placement handler. This can happen under certain stress I/O environment under the Linux iSCSI offload operation. This change only affects Broadcom's 5709 chipset. Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5 Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2 RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5 RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216 RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000 RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220 RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000 R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010 R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558 FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0 Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820) Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520 ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20 0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8 Call Trace: <IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b [<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231 [<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1 [<ffffffff800125a0>] __do_softirq+0x89/0x133 [<ffffffff8005e30c>] call_softirq+0x1c/0x28 [<ffffffff8006d5de>] do_softirq+0x2c/0x7d [<ffffffff8006d46e>] do_IRQ+0xee/0xf7 [<ffffffff8005d625>] ret_from_intr+0x0/0xa <EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341 [<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341 [<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341 [<ffffffff80049560>] cpu_idle+0x95/0xb8 [<ffffffff80078b1c>] start_secondary+0x479/0x488 Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-02 09:47:26 -07:00
sreekanth.reddy@lsi.com	9772793ce1	SCSI: mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA commit `10cce6d8b5` upstream. This patch checks whether HBA is SAS2008 B0 controller. if it is a SAS2008 B0 controller then it use IO-APIC interrupt instead of MSIX, as SAS2008 B0 controller doesn't support MSIX interrupts. [jejb: fix whitespace problems] Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-02 09:47:26 -07:00
James Bottomley	778105ad42	SCSI: Fix 'Device not ready' issue on mpt2sas commit `14216561e1` upstream. This is a particularly nasty SCSI ATA Translation Layer (SATL) problem. SAT-2 says (section 8.12.2) if the device is in the stopped state as the result of processing a START STOP UNIT command (see 9.11), then the SATL shall terminate the TEST UNIT READY command with CHECK CONDITION status with the sense key set to NOT READY and the additional sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED; mpt2sas internal SATL seems to implement this. The result is very confusing standby behaviour (using hdparm -y). If you suspend a drive and then send another command, usually it wakes up. However, if the next command is a TEST UNIT READY, the SATL sees that the drive is suspended and proceeds to follow the SATL rules for this, returning NOT READY to all subsequent commands. This means that the ordering of TEST UNIT READY is crucial: if you send TUR and then a command, you get a NOT READY to both back. If you send a command and then a TUR, you get GOOD status because the preceeding command woke the drive. This bit us badly because commit `85ef06d1d2` Author: Tejun Heo <tj@kernel.org> Date: Fri Jul 1 16:17:47 2011 +0200 block: flush MEDIA_CHANGE from drivers on close(2) Changed our ordering on TEST UNIT READY commands meaning that SATA drives connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas SATL sees the suspend before the drives get awoken by the next ATA command) resulting in lots of failed commands. The standard is completely nuts forcing this inconsistent behaviour, but we have to work around it. The fix for this is twofold: 1. Set the allow_restart flag so we wake the drive when we see it has been suspended 2. Return all TEST UNIT READY status directly to the mid layer without any further error handling which prevents us causing error handling which may offline the device just because of a media check TUR. Reported-by: Matthias Prager <linux@matthiasprager.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-09-14 10:00:50 -07:00
sreekanth.reddy@lsi.com	a87c6c9daa	SCSI: mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth command line option to a very small value commit `338b131a32` upstream. If the specified max_queue_depth setting is less than the expected number of internal commands, then driver will calculate the queue depth size to a negitive number. This negitive number is actually a very large number because variable is unsigned 16bit integer. So, the driver will ask for a very large amount of memory for message frames and resulting into oops as memory allocation routines will not able to handle such a large request. So, in order to limit this kind of oops, The driver need to set the max_queue_depth to a scsi mid layer's can_queue value. Then the overall message frames required for IO is minimum of either (max_queue_depth plus internal commands) or the IOC global credits. Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-09-14 10:00:50 -07:00
Kashyap Desai	b787880f60	SCSI: megaraid_sas: Move poll_aen_lock initializer commit `bd8d6dd43a` upstream. The following patch moves the poll_aen_lock initializer from megasas_probe_one() to megasas_init(). This prevents a crash when a user loads the driver and tries to issue a poll() system call on the ioctl interface with no adapters present. Signed-off-by: Kashyap Desai <Kashyap.Desai@lsi.com> Signed-off-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-09-14 10:00:50 -07:00
Bart Van Assche	8add44b313	SCSI: Avoid dangling pointer in scsi_requeue_command() commit `940f5d47e2` upstream. When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-09 08:27:34 -07:00
Dan Williams	8fff2f802f	SCSI: fix hot unplug vs async scan race commit `3b661a92e8` upstream. The following crash results from cases where the end_device has been removed before scsi_sysfs_add_sdev has had a chance to run. BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6 ... Call Trace: [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3 [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8125e641>] kobject_add_varg+0x41/0x50 [<ffffffff8125e70b>] kobject_add+0x64/0x66 [<ffffffff8131122b>] device_add+0x12d/0x63a [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56 [<ffffffff8107de15>] ? module_refcount+0x89/0xa0 [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145 ...teach scsi_sysfs_add_devices() to check for deleted devices() before trying to add them, and teach scsi_remove_target() how to remove targets that have not been added via device_add(). Reported-by: Dariusz Majchrzak <dariusz.majchrzak@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-09 08:27:34 -07:00
Dan Williams	bd9afacc54	SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) commit `57fc2e335f` upstream. Rapid ata hotplug on a libsas controller results in cases where libsas is waiting indefinitely on eh to perform an ata probe. A race exists between scsi_schedule_eh() and scsi_restart_operations() in the case when scsi_restart_operations() issues i/o to other devices in the sas domain. When this happens the host state transitions from SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and ->host_busy is non-zero so we put the eh thread to sleep even though ->host_eh_scheduled is active. Before putting the error handler to sleep we need to check if the host_state needs to return to SHOST_RECOVERY for another trip through eh. Since i/o that is released by scsi_restart_operations has been blocked for at least one eh cycle, this implementation allows those i/o's to run before another eh cycle starts to discourage hung task timeouts. Reported-by: Tom Jackson <thomas.p.jackson@intel.com> Tested-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-09 08:27:34 -07:00
Dan Williams	3f67ec4b51	SCSI: libsas: fix sas_discover_devices return code handling commit `b17caa174a` upstream. commit `198439e4` [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev() commit `19252de6` [SCSI] libsas: fix wide port hotplug issues The above commits seem to have confused the return value of sas_ex_discover_dev which is non-zero on failure and sas_ex_join_wide_port which just indicates short circuiting discovery on already established ports. The result is random discovery failures depending on configuration. Calls to sas_ex_join_wide_port are the source of the trouble as its return value is errantly assigned to 'res'. Convert it to bool and stop returning its result up the stack. Tested-by: Dan Melnic <dan.melnic@amd.com> Reported-by: Dan Melnic <dan.melnic@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-09 08:27:34 -07:00
Dan Williams	2da74cd8a6	SCSI: libsas: continue revalidation commit `26f2f199ff` upstream. Continue running revalidation until no more broadcast devices are discovered. Fixes cases where re-discovery completes too early in a domain with multiple expanders with pending re-discovery events. Servicing BCNs can get backed up behind error recovery. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-09 08:27:34 -07:00
Jun'ichi Nomura	f4090d8272	SCSI: Fix dm-multipath starvation when scsi host is busy commit `b7e94a1686` upstream. block congestion control doesn't have any concept of fairness across multiple queues. This means that if SCSI reports the host as busy in the queue congestion control it can result in an unfair starvation situation in dm-mp if there are multiple multipath devices on the same host. For example: http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html The fix for this is to report only the sdev busy state (and ignore the host busy state) in the block congestion control call back. The host is still congested, but the SCSI subsystem will sort out the congestion in a fair way because it knows the relation between the queues and the host. [jejb: fixed up trailing whitespace] Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-10 00:32:57 +09:00
James Bottomley	af9c3bad26	SCSI: fix scsi_wait_scan commit `1ff2f40305` upstream. Commit `c751085943` Author: Rafael J. Wysocki <rjw@sisk.pl> Date: Sun Apr 12 20:06:56 2009 +0200 PM/Hibernate: Wait for SCSI devices scan to complete during resume Broke the scsi_wait_scan module in 2.6.30. Apparently debian still uses it so fix it and backport to stable before removing it in 3.6. The breakage is caused because the function template in include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in. That means that in the modular case (which is every distro), the scsi_wait_scan module does a simple async_synchronize_full() instead of waiting for scans. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-10 00:32:57 +09:00
Dan Williams	923744e41e	isci: fix oem parameter validation on single controller skus commit `fc25f79af3` upstream. OEM parameters [1] are parsed from the platform option-rom / efi driver. By default the driver was validating the parameters for the dual-controller case, but in single-controller case only the first set of parameters may be valid. Limit the validation to the number of actual controllers detected otherwise the driver may fail to parse the valid parameters leading to driver-load or runtime failures. [1] the platform specific set of phy address, configuration,and analog tuning values Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-01 15:13:00 +08:00
Stephen M. Cameron	d4f3ef6343	SCSI: hpsa: Fix problem with MSA2xxx devices commit `9bc3711cbb` upstream. Upgraded firmware on Smart Array P7xx (and some others) made them show up as SCSI revision 5 devices and this caused the driver to fail to map MSA2xxx logical drives to the correct bus/target/lun. A symptom of this would be that the target ID of the logical drives as presented by the external storage array is ignored, and all such logical drives are assigned to target zero, differentiated only by LUN. Some multipath software reportedly does not deal well with this behavior, failing to recognize different paths to the same device as such. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Scott Teel <scott.teel@hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-01 15:12:57 +08:00
nagalakshmi.nandigama@lsi.com	35d73fe5e3	SCSI: mpt2sas: Fix for panic happening because of improper memory allocation commit `e42fafc25f` upstream. The ioc->pfacts member in the IOC structure is getting set to zero following a call to _base_get_ioc_facts due to the memset in that routine. So if the ioc->pfacts was read after a host reset, there would be a NULL pointer dereference. The routine _base_get_ioc_facts is called from context of host reset. The problem in _base_get_ioc_facts is the size of Mpi2IOCFactsReply is 64, whereas the sizeof "struct mpt2sas_facts" is 60, so there is a four byte overflow resulting from the memset. Also, there is memset in _base_get_port_facts using the incorrect structure, it should be "struct mpt2sas_port_facts" instead of Mpi2PortFactsReply. Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-01 15:12:54 +08:00
Stephen M. Cameron	9f5ad2813c	SCSI: hpsa: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler commit `45bcf018d1` upstream. IRQF_SHARED is required for older controllers that don't support MSI(X) and which may end up sharing an interrupt. All the controllers hpsa normally supports have MSI(X) capability, but older controllers may be encountered via the hpsa_allow_any=1 module parameter. Also remove deprecated IRQF_DISABLED. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-21 09:40:05 -07:00
Dan Williams	cfea9a2534	SCSI: libsas: fix false positive 'device attached' conditions commit `7d1d865181` upstream. Normalize phy->attached_sas_addr to return a zero-address in the case when device-type == NO_DEVICE or the linkrate is invalid to handle expanders that put non-zero sas addresses in the discovery response: sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device) Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-07 08:56:37 -07:00
Thomas Jackson	0c5f01a4e1	SCSI: libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys commit `1699490db3` upstream. If an expander reports 'PHY VACANT' for a phy index prior to the one that generated a BCN libsas fails rediscovery. Since a vacant phy is defined as a valid phy index that will never have an attached device just continue the search. Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-07 08:56:37 -07:00
Boaz Harrosh	d4d6cc13fa	osd_uld: Bump MAX_OSD_DEVICES from 64 to 1,048,576 commit `41f8ad7636` upstream. It used to be that minors where 8 bit. But now they are actually 20 bit. So the fix is simplicity itself. I've tested with 300 devices and all user-mode utils work just fine. I have also mechanically added 10,000 to the ida (so devices are /dev/osd10000, /dev/osd10001 ...) and was able to mkfs an exofs filesystem and access osds from user-mode. All the open-osd user-mode code uses the same library to access devices through their symbolic names in /dev/osdX so I'd say it's pretty safe. (Well tested) This patch is very important because some of the systems that will be deploying the 3.2 pnfs-objects code are larger than 64 OSDs and will stop to work properly when reaching that number. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-03-12 10:32:57 -07:00
Alan Stern	4b3b9e9efc	scsi_pm: Fix bug in the SCSI power management handler commit `fea6d607e1` upstream. This patch (as1520) fixes a bug in the SCSI layer's power management implementation. LUN scanning can be carried out asynchronously in do_scan_async(), and sd uses an asynchronous thread for the time-consuming parts of disk probing in sd_probe_async(). Currently nothing coordinates these async threads with system sleep transitions; they can and do attempt to continue scanning/probing SCSI devices even after the host adapter has been suspended. As one might expect, the outcome is not ideal. This is what the "prepare" stage of system suspend was created for. After the prepare callback has been called for a host, target, or device, drivers are not allowed to register any children underneath them. Currently the SCSI prepare callback is not implemented; this patch rectifies that omission. For SCSI hosts, the prepare routine calls scsi_complete_async_scans() to wait until async scanning is finished. It might be slightly more efficient to wait only until the host in question has been scanned, but there's currently no way to do that. Besides, during a sleep transition we will ultimately have to wait until all the host scanning has finished anyway. For SCSI devices, the prepare routine calls async_synchronize_full() to wait until sd probing is finished. The routine does nothing for SCSI targets, because asynchronous target scanning is done only as part of host scanning. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-02-29 16:34:30 -08:00
Huajun Li	bbfe8a71f0	scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost' commit `267a6ad4ae` upstream. In do_scan_async(), calling scsi_autopm_put_host(shost) may reference freed shost, and cause Posison overwitten warning. Yes, this case can happen, for example, an USB is disconnected just when do_scan_async() thread starts to run, then scsi_host_put() called in scsi_finish_async_scan() will lead to shost be freed(because the refcount of shost->shost_gendev decreases to 1 after USB disconnects), at this point, if references shost again, system will show following warning msg. To make scsi_autopm_put_host(shost) always reference a valid shost, put it just before scsi_host_put() in function scsi_finish_async_scan(). [ 299.281565] ============================================================================= [ 299.281634] BUG kmalloc-4096 (Tainted: G I ): Poison overwritten [ 299.281682] ----------------------------------------------------------------------------- [ 299.281684] [ 299.281752] INFO: 0xffff880056c305d0-0xffff880056c305d0. First byte 0x6a instead of 0x6b [ 299.281816] INFO: Allocated in scsi_host_alloc+0x4a/0x490 age=1688 cpu=1 pid=2004 [ 299.281870] __slab_alloc+0x617/0x6c1 [ 299.281901] __kmalloc+0x28c/0x2e0 [ 299.281931] scsi_host_alloc+0x4a/0x490 [ 299.281966] usb_stor_probe1+0x5b/0xc40 [usb_storage] [ 299.282010] storage_probe+0xa4/0xe0 [usb_storage] [ 299.282062] usb_probe_interface+0x172/0x330 [usbcore] [ 299.282105] driver_probe_device+0x257/0x3b0 [ 299.282138] __driver_attach+0x103/0x110 [ 299.282171] bus_for_each_dev+0x8e/0xe0 [ 299.282201] driver_attach+0x26/0x30 [ 299.282230] bus_add_driver+0x1c4/0x430 [ 299.282260] driver_register+0xb6/0x230 [ 299.282298] usb_register_driver+0xe5/0x270 [usbcore] [ 299.282337] 0xffffffffa04ab03d [ 299.282364] do_one_initcall+0x47/0x230 [ 299.282396] sys_init_module+0xa0f/0x1fe0 [ 299.282429] INFO: Freed in scsi_host_dev_release+0x18a/0x1d0 age=85 cpu=0 pid=2008 [ 299.282482] __slab_free+0x3c/0x2a1 [ 299.282510] kfree+0x296/0x310 [ 299.282536] scsi_host_dev_release+0x18a/0x1d0 [ 299.282574] device_release+0x74/0x100 [ 299.282606] kobject_release+0xc7/0x2a0 [ 299.282637] kobject_put+0x54/0xa0 [ 299.282668] put_device+0x27/0x40 [ 299.282694] scsi_host_put+0x1d/0x30 [ 299.282723] do_scan_async+0x1fc/0x2b0 [ 299.282753] kthread+0xdf/0xf0 [ 299.282782] kernel_thread_helper+0x4/0x10 [ 299.282817] INFO: Slab 0xffffea00015b0c00 objects=7 used=7 fp=0x (null) flags=0x100000000004080 [ 299.282882] INFO: Object 0xffff880056c30000 @offset=0 fp=0x (null) [ 299.282884] ... Signed-off-by: Huajun Li <huajun.li.lee@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-02-29 16:34:30 -08:00
Eric Dumazet	8a533666d1	net: fix NULL dereferences in check_peer_redir() [ Upstream commit `d3aaeb38c4`, along with dependent backports of commits: `69cce1d140` `9de79c127c` `218fa90f07` `580da35a31` `f7e57044ee` `e049f28883` ] Gergely Kalman reported crashes in check_peer_redir(). It appears commit `f39925dbde` (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-02-13 11:06:13 -08:00
Stratos Psomadakis	7b3e8a2073	sym53c8xx: Fix NULL pointer dereference in slave_destroy commit `cced5041ed` upstream. sym53c8xx_slave_destroy unconditionally assumes that sym53c8xx_slave_alloc has succesesfully allocated a sym_lcb. This can lead to a NULL pointer dereference (exposed by commit `4e6c82b`). Signed-off-by: Stratos Psomadakis <psomas@gentoo.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-25 17:24:59 -08:00
Paolo Bonzini	8bd8442fec	block: fail SCSI passthrough ioctls on partition devices commit `0bfc96cb77` upstream. [ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl and -ENOIOCTLCMD from sd_compat_ioctl. ] Linux allows executing the SG_IO ioctl on a partition or LVM volume, and will pass the command to the underlying block device. This is well-known, but it is also a large security problem when (via Unix permissions, ACLs, SELinux or a combination thereof) a program or user needs to be granted access only to part of the disk. This patch lets partitions forward a small set of harmless ioctls; others are logged with printk so that we can see which ioctls are actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred. Of course it was being sent to a (partition on a) hard disk, so it would have failed with ENOTTY and the patch isn't changing anything in practice. Still, I'm treating it specially to avoid spamming the logs. In principle, this restriction should include programs running with CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and /dev/sdb, it still should not be able to read/write outside the boundaries of /dev/sda2 independent of the capabilities. However, for now programs with CAP_SYS_RAWIO will still be allowed to send the ioctls. Their actions will still be logged. This patch does not affect the non-libata IDE driver. That driver however already tests for bd != bd->bd_contains before issuing some ioctl; it could be restricted further to forbid these ioctls even for programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ Make it also print the command name when warning - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-25 17:24:54 -08:00
Paolo Bonzini	3b8373b85c	block: add and use scsi_blk_cmd_ioctl commit `577ebb374c` upstream. Introduce a wrapper around scsi_cmd_ioctl that takes a block device. The function will then be enhanced to detect partition block devices and, in that case, subject the ioctls to whitelisting. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-25 17:24:54 -08:00
nagalakshmi.nandigama@lsi.com	73669debb5	SCSI: mpt2sas : Fix for memory allocation error for large host credits commit `aff132d95f` upstream. The amount of memory required for tracking chain buffers is rather large, and when the host credit count is big, memory allocation failure occurs inside __get_free_pages. The fix is to limit the number of chains to 100,000. In addition, the number of host credits is limited to 30,000 IOs. However this limitation can be overridden this using the command line option max_queue_depth. The algorithm for calculating the reply_post_queue_depth is changed so that it is equal to (reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2). Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-25 17:24:41 -08:00
nagalakshmi.nandigama@lsi.com	de3f88ba08	SCSI: mpt2sas: Release spinlock for the raid device list before blocking it commit `30c43282f3` upstream. Added code to release the spinlock that is used to protect the raid device list before calling a function that can block. The blocking was causing a reschedule, and subsequently it is tried to acquire the same lock, resulting in a panic (NMI Watchdog detecting a CPU lockup). Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-25 17:24:40 -08:00
kashyap.desai@lsi.com	18366c3fed	SCSI: mpt2sas: Added missing mpt2sas_base_detach call from scsih_remove context commit `9ae89b0296` upstream. mpt2sas_base_detach() call was removed from _scsih_remove() while doing some code shuffling. Mainly when we work on adding code for scsih_shutdown(). I have added back mpt2sas_base_detach() which will get callled from _scsih_remove(). Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-01-12 11:35:51 -08:00
Nagalakshmi Nandigama	fca54d03a8	mpt2sas: fix non-x86 crash on shutdown Upstrem commit: `911ae9434f` There's a bug in the MSIX backup and restore routines that cause a crash on non-x86 (direct access to PCI space not via read/write). These routines are unnecessary and were removed by the above commit, so also remove them from stable to fix the crash. Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 14:14:00 -08:00
Thomas Gleixner	d27020f6c0	SCSI: fcoe: Fix preempt count leak in fcoe_filter_frames() commit `7e1e7ead88` upstream. The error exit path leaks preempt count. Add the missing put_cpu(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 14:13:47 -08:00
Anton Blanchard	e57a4c768c	SCSI: mpt2sas: _scsih_smart_predicted_fault uses GFP_KERNEL in interrupt context commit `f6a290b419` upstream. _scsih_smart_predicted_fault is called in an interrupt and therefore must allocate memory using GFP_ATOMIC. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 14:13:47 -08:00
Hannes Reinecke	bfc7690318	SCSI: Silencing 'killing requests for dead queue' commit `745718132c` upstream. When we tear down a device we try to flush all outstanding commands in scsi_free_queue(). However the check in scsi_request_fn() is imperfect as it only signals that we _might start_ aborting commands, not that we've actually aborted some. So move the printk inside the scsi_kill_request function, this will also give us a hint about which commands are aborted. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-12-09 08:52:36 -08:00
Vasily Averin	6e5dcf64df	aacraid: controller hangs if kernel uses non-default ASPM policy commit `cf16123c9c` upstream. Aacraid controller can hang on some nodes if kernel uses non-default (powersave) ASPM policy. Controller hangs shortly after successful load and hardware detection. Scsi error handler detects this hang and tries to restart hardware but it does not help. Initially it was noticed on RHEL6-based openVZ kernel after backporting aacraid driver from mainline (RHEL6 kernel with original driver works well) http://bugzilla.openvz.org/show_bug.cgi?id=2043 This issue happens because default ASPM policy was changed in Red Hat kernels. Therefore guys from Red Hat have noticed this problem long time ago: on Fedora 12 https://bugzilla.redhat.com/show_bug.cgi?id=540478 on Fedora 14 https://bugzilla.redhat.com/show_bug.cgi?id=679385 In RHEL6 kernel this issue was fixed, ASPM was disabled in aacraid driver. In kernel changelog I've found that seems it was done by Matthew Garrett: - [scsi] aacraid: Disable ASPM by default (Matthew Garrett) [599735] However seems this patch was not submitted to mainline. I've reproduced this issue on vanilla 3.1.0 kernel booted with "pcie_aspm.policy=powersave" option, So I believe it makes sense to do it now. Signed-off-by: Vasily Averin <vvs@sw.ru> [mjg: Checking the Windows drivers indicates that they disable ASPM under all circumstances, so:] Acked-by: Matthew Garrett <mjg@redhat.com> Acked-by: Achim Leubner <Achim_Leubner@pmc-sierra.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-26 09:09:52 -08:00
Matthew Garrett	4987223080	hpsa: Disable ASPM commit `e5a44df85e` upstream. The Windows driver .inf disables ASPM on hpsa devices. Do the same because the selection of a non default ASPM policy can cause the device to hang. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-26 09:09:52 -08:00
James Bottomley	9c6e9f7596	fix WARNING: at drivers/scsi/scsi_lib.c:1704 commit `4e6c82b361` upstream. On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote: > Hi all, > > Starting some time last week I am getting the following during boot on > our PPC970 blade: > > calling .ipr_init+0x0/0x68 @ 1 > ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011) > ipr 0000:01:01.0: Found IOA with IRQ: 26 > ipr 0000:01:01.0: Starting IOA initialization sequence. > ipr 0000:01:01.0: Adapter firmware version: 06160039 > ipr 0000:01:01.0: IOA initialized. > scsi0 : IBM 572E Storage Adapter > ------------[ cut here ]------------ > WARNING: at drivers/scsi/scsi_lib.c:1704 > Modules linked in: > NIP: c00000000053b3d4 LR: c00000000053e5b0 CTR: c000000000541d70 > REGS: c0000000783c2f60 TRAP: 0700 Not tainted (3.1.0-autokern1) > MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24002024 XER: 20000002 > TASK = c0000000783b8000[1] 'swapper' THREAD: c0000000783c0000 CPU: 0 > GPR00: 0000000000000001 c0000000783c31e0 c000000000cf38b0 c00000000239a9d0 > GPR04: c000000000cbe8f8 0000000000000000 c0000000783c3040 0000000000000000 > GPR08: c000000075daf488 c000000078a3b7ff c000000000bcacc8 0000000000000000 > GPR12: 0000000044002028 c000000007ffb000 0000000002e40000 000000000099b800 > GPR16: 0000000000000000 c000000000bba5fc c000000000a61db8 0000000000000000 > GPR20: 0000000001b77200 0000000000000000 c000000078990000 0000000000000001 > GPR24: c000000002396828 0000000000000000 0000000000000000 c000000078a3b938 > GPR28: fffffffffffffffa c0000000008ad2c0 c000000000c7faa8 c00000000239a9d0 > NIP [c00000000053b3d4] .scsi_free_queue+0x24/0x90 > LR [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0 > Call Trace: > [c0000000783c31e0] [c000000000c7faa8] wireless_seq_fops+0x278d0/0x2eb88 (unreliable) > [c0000000783c3270] [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0 > [c0000000783c3330] [c00000000053eba0] .scsi_probe_and_add_lun+0x390/0xb40 > [c0000000783c34a0] [c00000000053f7ec] .__scsi_scan_target+0x16c/0x650 > [c0000000783c35f0] [c00000000053fd90] .scsi_scan_channel+0xc0/0x100 > [c0000000783c36a0] [c00000000053fefc] .scsi_scan_host_selected+0x12c/0x1c0 > [c0000000783c3750] [c00000000083dcb4] .ipr_probe+0x2c0/0x390 > [c0000000783c3830] [c0000000003f50b4] .local_pci_probe+0x34/0x50 > [c0000000783c38a0] [c0000000003f5f78] .pci_device_probe+0x148/0x150 > [c0000000783c3950] [c0000000004e1e8c] .driver_probe_device+0xdc/0x210 > [c0000000783c39f0] [c0000000004e20cc] .__driver_attach+0x10c/0x110 > [c0000000783c3a80] [c0000000004e1228] .bus_for_each_dev+0x98/0xf0 > [c0000000783c3b30] [c0000000004e1bf8] .driver_attach+0x28/0x40 > [c0000000783c3bb0] [c0000000004e07d8] .bus_add_driver+0x218/0x340 > [c0000000783c3c60] [c0000000004e2a2c] .driver_register+0x9c/0x1b0 > [c0000000783c3d00] [c0000000003f62d4] .__pci_register_driver+0x64/0x140 > [c0000000783c3da0] [c000000000b99f88] .ipr_init+0x4c/0x68 > [c0000000783c3e20] [c00000000000ad24] .do_one_initcall+0x1a4/0x1e0 > [c0000000783c3ee0] [c000000000b512d0] .kernel_init+0x14c/0x1fc > [c0000000783c3f90] [c000000000022468] .kernel_thread+0x54/0x70 > Instruction dump: > ebe1fff8 7c0803a6 4e800020 7c0802a6 fba1ffe8 fbe1fff8 7c7f1b78 f8010010 > f821ff71 e8030398 3120ffff 7c090110 <0b000000> e86303b0 482de065 60000000 > ---[ end trace 759bed76a85e8dec ]--- > scsi 0:0:1:0: Direct-Access IBM-ESXS MAY2036RC T106 PQ: 0 ANSI: 5 > ------------[ cut here ]------------ > > I get lots more of these. The obvious commit to point the finger at > is `3308511c93` ("[SCSI] Make scsi_free_queue() kill pending SCSI > commands") but the root cause may be something different. Caused by commit `f7c9c6bb14` Author: Anton Blanchard <anton@samba.org> Date: Thu Nov 3 08:56:22 2011 +1100 [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev Doesn't completely do the teardown. The true fix is to do a proper teardown instead of hand rolling it Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-26 09:09:52 -08:00
Mike Miller	0447f4d565	hpsa: add small delay when using PCI Power Management to reset for kump commit `c4853efec6` upstream. The P600 requires a small delay when changing states. Otherwise we may think the board did not reset and we bail. This for kdump only and is particular to the P600. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-11 09:37:08 -08:00
nagalakshmi.nandigama@lsi.com	ac310457b4	mpt2sas: Fix for system hang when discovery in progress commit `0167ac67ff` upstream. Fix for issue : While discovery is in progress, hot unplug and hot plug of enclosure connected to the controller card is causing system to hang. When a device is in the process of being detected at driver load time then if it is removed, the device that is no longer present will not be added to the list. So the code in _scsih_probe_sas() is rearranged as such so the devices that failed to be detected are not added to the list. Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-11 09:37:02 -08:00
Anton Blanchard	6fffe11224	Fix block queue and elevator memory leak in scsi_alloc_sdev commit `f7c9c6bb14` upstream. When looking at memory consumption issues I noticed quite a lot of memory in the kmalloc-2048 bucket: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 6561 6471 98% 2.30K 243 27 15552K kmalloc-2048 Over 15MB. slub debug shows that cfq is responsible for almost all of it: # sort -nr /sys/kernel/slab/kmalloc-2048/alloc_calls 6402 .cfq_init_queue+0xec/0x460 age=43423/43564/43655 pid=1 cpus=4,11,13 In scsi_alloc_sdev we do scsi_alloc_queue but if slave_alloc fails we don't free it with scsi_free_queue. The patch below fixes the issue: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 135 72 53% 2.30K 5 27 320K kmalloc-2048 # cat /sys/kernel/slab/kmalloc-2048/alloc_calls 3 .cfq_init_queue+0xec/0x460 age=3811/3876/3925 pid=1 cpus=4,11,13 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-11 09:37:02 -08:00
Bart Van Assche	8fa9474347	Make scsi_free_queue() kill pending SCSI commands commit `3308511c93` upstream. Make sure that SCSI device removal via scsi_remove_host() does finish all pending SCSI commands. Currently that's not the case and hence removal of a SCSI host during I/O can cause a deadlock. See also "blkdev_issue_discard() hangs forever if underlying storage device is removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also http://lkml.org/lkml/2011/8/27/6. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-11 09:37:02 -08:00
Moger, Babu	51b702dc3d	scsi_dh: check queuedata pointer before proceeding further commit `a18a920c70` upstream. This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 #6 [ffff88007d647e78] worker_thread at ffffffff81060386 #7 [ffff88007d647ee8] kthread at ffffffff81064436 #8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <babu.moger@netapp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-11 09:37:02 -08:00

1 2 3 4 5 ...

7811 Commits