Current fc transport code, on io termination, is calling
nvme_cleanup_cmd() followed by the transport dma unmap routine
which also calls nvme_cleanup_cmd(). Which means two kfrees occur
on the same address, raising havoc. This resulted in odd data errors,
effectively corruption..
Fix by removing the extraneous double calls. Call now occurs only in
teardown paths and as part of dma unmap routine.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures. Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it. For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not
a Logical Unit Number anyway) there are subtile differences. Remove the
misleading comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A user reports APST is enabled, even when the NVMe is quirked or with
option "default_ps_max_latency_us=0".
The current logic will not set APST if the device is quirked. But the
NVMe in question will enable APST automatically.
Separate the logic "apst is supported" and "to enable apst", so we can
use the latter one to explicitly disable APST at initialiaztion.
BugLink: https://bugs.launchpad.net/bugs/1699004
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently we have no way to define a stable host-id but always use the one
which is randomly generated when we add the host or use the default host.
Provide a "hostid=%s" for user-space to pass in a persistent host-id which
overrides the randomly generated one.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The SCSI-to-NVMe translations were added to assist storage applications
utilizing SG_IO transitioning to NVMe. It was always recommended,
however, to use native NVMe for device management as too much is lost
in translation and the maintenance burden in keeping this kludgey
layer around has been neglected such that much of the translations are
completely broken.
This patch removes SG_IO handling from NVMe to avoid any confusion
regarding maintenance support for this interface. The config option for
NVMe SCSI emulation has been disabled by default since 4.5. The driver
has supported native nvme user commands since the beginning, and native
tooling is publicly available for use or as reference for anyone writing
their own tools, so there's no excuse for hanging onto a broken crutch.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Given that the code is simple enough it seems better
then passing a tag by reference for each call site, also
we can now get rid of __nvme_process_cq.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Also, maintain a consumed counter to rely on for doorbell and
cqe_seen update instead of directly relying on the cq head and phase.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nice abstraction of the actual mechanics of how to do it.
Note the change that we call it after we assign nvmeq->cq_head
to avoid passing it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dma_declare_coherent_memory() and friends are designed to account
difference in CPU and device addresses. However, when it is used with
reserved memory regions there is assumption that CPU and device have
the same view on address space. This assumption gets invalid when
reserved memory for coherent DMA allocations is referenced by device
with non-empty "dma-range" property.
Simply feeding device address as rmem->base + dev->dma_pfn_offset
would not work due to reserved memory region can be shared, so this
patch turns device address to be expressed with help of CPU address
and device's dma_pfn_offset in case memory reservation has been done
via device tree; non device tree users continue to use the old scheme.
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Andras Szemzo <sza@esh.hu>
Tested-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
dmam_alloc_noncoherent is a trivial wrapper around dmam_alloc_attrs,
that hardcodes one particular flag. Make the devres code more
flexible by allowing the callers to pass arbitrary flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
After commit 9e442aa6a753 ("x86: remove DMA_ERROR_CODE"), the inlining
decisions in the qat driver changed slightly, introducing a new false-positive
warning:
drivers/crypto/qat/qat_common/qat_algs.c: In function 'qat_alg_sgl_to_bufl.isra.6':
include/linux/dma-mapping.h:228:2: error: 'sz_out' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/crypto/qat/qat_common/qat_algs.c:676:9: note: 'sz_out' was declared here
The patch that introduced this is correct, so let's just avoid the
warning in this driver by rearranging the unwinding after an error
to make it more obvious to the compiler what is going on.
The problem here is the 'if (unlikely(dma_mapping_error(dev, blp)))'
check, in which the 'unlikely' causes gcc to forget what it knew about
the state of the variables. Cleaning up the dma state in the reverse
order it was created means we can simplify the logic so it doesn't have
to know about that state, and also makes it easier to understand.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
au1100fb is using managed dma allocations, so it doesn't need to
explicitly free the dma memory in the error path (and if it did
it would have to use the managed version).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
And instead wire it up as method for all the dma_map_ops instances.
Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pass-through devices to VM guest can get updated IRQ affinity
information via irq_set_affinity() when not running in guest mode.
Currently, AMD IOMMU driver in GA mode ignores the updated information
if the pass-through device is setup to use vAPIC regardless of guest_mode.
This could cause invalid interrupt remapping.
Also, the guest_mode bit should be set and cleared only when
SVM updates posted-interrupt interrupt remapping information.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Fixes: d98de49a53 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Most dma_map_ops structures are never modified. Constify these
structures such that these can be write-protected.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Johan writes:
USB-serial updates for v4.13-rc1
Here are the USB-serial updates for 4.13, including support for
manipulating the modem-control signals of qcserial devices, propagation
of errnos after late probe errors from usb-serial core, and a couple of
clean ups.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
This callback should never return NULL. Print a warning if
that happens so that we notice and can fix it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The generic device_group call-backs in iommu.c return NULL
in case of error. Since they are getting ERR_PTR values from
iommu_group_alloc(), just pass them up instead.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu_group_get_for_dev() function also attaches the
device to its group, so this code doesn't need to be in the
iommu driver.
Further by using this function the driver can make use of
default domains in the future.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
get_cpu() disables preemption and returns the current CPU number. The
CPU number is only used once while retrieving the address of the local's
CPU deferred_flush pointer.
We can instead use raw_cpu_ptr() while we remain preemptible. The worst
thing that can happen is that flush_unmaps_timeout() is invoked multiple
times: once by taskA after seeing HIGH_WATER_MARK and then preempted to
another CPU and then by taskB which saw HIGH_WATER_MARK on the same CPU
as taskA. It is also likely that ->size got from HIGH_WATER_MARK to 0
right after its read because another CPU invoked flush_unmaps_timeout()
for this CPU.
The access to flush_data is protected by a spinlock so even if we get
migrated to another CPU or preempted - the data structure is protected.
While at it, I marked deferred_flush static since I can't find a
reference to it outside of this file.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 583248e662 ("iommu/iova: Disable preemption around use of
this_cpu_ptr()") disables preemption while accessing a per-CPU variable.
This does keep lockdep quiet. However I don't see the point why it is
bad if we get migrated after its access to another CPU.
__iova_rcache_insert() and __iova_rcache_get() immediately locks the
variable after obtaining it - before accessing its members.
_If_ we get migrated away after retrieving the address of cpu_rcache
before taking the lock then the *other* task on the same CPU will
retrieve the same address of cpu_rcache and will spin on the lock.
alloc_iova_fast() disables preemption while invoking
free_cpu_cached_iovas() on each CPU. The function itself uses
per_cpu_ptr() which does not trigger a warning (like this_cpu_ptr()
does). It _could_ make sense to use get_online_cpus() instead but the we
have a hotplug notifier for CPU down (and none for up) so we are good.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We were reading the no-implicit sync flag the wrong way around,
synchronizing too much for the explicit case, and not at all for the
implicit case. Oops.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The addition of the flags member to etnaviv_gem_submit structure didn't
take into account that the last member of this structure is a variable
length array.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
In prior commits the selected clock frequency does not propagate
correctly to what is written to the TRF7970A_MODULATOR_SYS_CLK_CTRL
register.
Signed-off-by: Geoff Lansberry <geoff@kuvee.com>
Acked-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Just three minor fixups for stuff in -next.
* tag 'drm-intel-next-fixes-2017-06-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Clear execbuf's vma backpointer upon release
drm/i915: Pass the right flags to i915_vma_move_to_active()
drm/i915/cnl: Fix RMW on ddi vswing sequence.
- a fix from Eric for synchronization with etnaviv exported dma-bufs
- thermal throttle support for newer GPU cores
- updated module clock gating to work around GPU errata
- a fix to restore userspace buffer cache performance
* 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux:
drm/etnaviv: restore ETNA_PREP_NOSYNC behaviour
drm/etnaviv: implement cooling support for new GPU cores
drm/etnaviv: update MLCG disables with info from newer Vivante driver
drm/etnaviv: update common.xml.h
drm/etnaviv: Expose our reservation object when exporting a dmabuf.
Just a few minor fixes. Important one is the execbuf async fix (aka
ANDROID_native_sync). There was another patch for a display coherency
corner case on APL, but we've random-walked in that space too much,
and the cherry-pick looked really invasive.
* tag 'drm-intel-fixes-2017-06-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
drm/i915: Retire the VMA's fence tracker before unbinding
Single vmwgfx fix
* 'vmwgfx-fixes-4.12' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
If a device is offline it can still be set to read-only via the bus id
through sysfs. Only the read-only feature flag for the ccw_device is
then set. If the device is online the corresponding block device needs
to be set to read-only as well (via set_disk_ro()).
The check whether there is a device to do so, however, happens after the
feature flag was set. This leads to an unnecessary "no such device"
error in the offline case.
This bug was introduced by commit 7571cb1c8e3cc ("s390/dasd: Make use of
dasd_set_feature() more often"). Fix this by simply returning count if
no device is available.
Fixes: 7571cb1c8e3cc ("s390/dasd: Make use of dasd_set_feature() more often")
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.
Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make sure to drop the reference to the dma device taken by
of_find_device_by_node() on probe errors and on driver unbind.
Fixes: 334ae61477 ("sparc: Kill SBUS DVMA layer.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the queue command returns DID_NO_CONNECT anytime the rport is
not in RPORT_ST_READY state. Changing it to return DID_NO_CONNECT only
when the rport is in RPORT_ST_DELETE state. When the rport is in one of
the init states retruning DID_IMM_RETRY.
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Setting speed based on the vinc device parameter read during
linkup. Also adding support to display 25,40 and 100G
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added the timestamps for
1. current timestamp
2. last fnic stats read timestamp
3. last fnic stats reset timestamp
and the deltas since last stats read and last reset in fnic stats.
fnic stats uses debugfs
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>