Commit Graph

62693 Commits

Author SHA1 Message Date
Matias Bjørling d7b6801673 lightnvm: combine 1.2 and 2.0 command flags
Add nvm_set_flags helper to enable core to appropriately
set the command flags for read/write/erase depending on which version
a drive supports.

The flags arguments can be distilled into the access hint,
scrambling, and program/erase suspend. Replace the access hint with
a "is_seq" parameter. The rest of the flags are dependent on the
command opcode, which is trivial to detect and set.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:05 -06:00
James Smart 783f4a4408 nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O
When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path.  Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.

Example: single path connectivity, controller is under load, and a reset
is induced.  An I/O is received:

  a) while the reset state has been set but the queues have yet to be
     stopped; or
  b) after queues are started (at end of reset) but before the reconnect
     has completed.

The I/O finishes with an EIO status.

This patch makes the following changes:

  - Adds the HOST_PATH_ERROR pathing status from TP4028
  - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
  - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:13 -07:00
Jens Axboe c0aac682fa Merge tag 'v4.19-rc6' into for-4.20/block
Merge -rc6 in, for two reasons:

1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
   they aren't in the 4.20 branch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

* tag 'v4.19-rc6': (780 commits)
  Linux 4.19-rc6
  MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
  cpufreq: qcom-kryo: Fix section annotations
  perf/core: Add sanity check to deal with pinned event failure
  xen/blkfront: correct purging of persistent grants
  Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
  selftests/powerpc: Fix Makefiles for headers_install change
  blk-mq: I/O and timer unplugs are inverted in blktrace
  dax: Fix deadlock in dax_lock_mapping_entry()
  x86/boot: Fix kexec booting failure in the SEV bit detection code
  bcache: add separate workqueue for journal_write to avoid deadlock
  drm/amd/display: Fix Edid emulation for linux
  drm/amd/display: Fix Vega10 lightup on S3 resume
  drm/amdgpu: Fix vce work queue was not cancelled when suspend
  Revert "drm/panel: Add device_link from panel device to DRM device"
  xen/blkfront: When purging persistent grants, keep them in the buffer
  clocksource/drivers/timer-atmel-pit: Properly handle error cases
  block: fix deadline elevator drain for zoned block devices
  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
  drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-01 08:58:57 -06:00
Greg Kroah-Hartman 2f19e7a7e6 Merge tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Mark writes:
  "spi: Fixes for v4.19

   Quite a few fixes for the Renesas drivers in here, plus a fix for the
   Tegra driver and some documentation fixes for the recently added
   spi-mem code.  The Tegra fix is relatively large but fairly
   straightforward and mechanical, it runs on probe so it's been
   reasonably well covered in -next testing."

* tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
  spi: spi-mem: Add missing description for data.nbytes field
  spi: rspi: Fix interrupted DMA transfers
  spi: rspi: Fix invalid SPI use during system suspend
  spi: sh-msiof: Fix handling of write value for SISTR register
  spi: sh-msiof: Fix invalid SPI use during system suspend
  spi: gpio: Fix copy-and-paste error
  spi: tegra20-slink: explicitly enable/disable clock
2018-09-28 18:04:06 -07:00
Greg Kroah-Hartman 8f0566118e Merge tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Mark writes:
  "regulator: Fixes for 4.19

   A collection of fairly minor bug fixes here, a couple of driver
   specific ones plus two core fixes.  There's one fix for the new
   suspend state code which fixes some confusion with constant values
   that are supposed to indicate noop operation and another fixing a
   race condition with the creation of sysfs files on new regulators."

* tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fix crash caused by null driver data
  regulator: Fix 'do-nothing' value for regulators without suspend state
  regulator: da9063: fix DT probing with constraints
  regulator: bd71837: Disable voltage monitoring for LDO3/4
2018-09-28 18:02:25 -07:00
Hannes Reinecke fef912bf86 block: genhd: add 'groups' argument to device_add_disk
Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().

Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-28 08:30:28 -06:00
Omar Sandoval ed88660a53 block: move call of scheduler's ->completed_request() hook
Commit 4bc6339a58 ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-27 17:34:52 -06:00
Bart Van Assche 18c9a6bbe0 percpu-refcount: Introduce percpu_ref_resurrect()
This function will be used in a later patch to switch the struct
request_queue q_usage_counter from killed back to live. In contrast
to percpu_ref_reinit(), this new function does not require that the
refcount is zero.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:29 -06:00
Bart Van Assche cd84a62e00 block, scsi: Change the preempt-only flag into a counter
The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:28 -06:00
Bart Van Assche bca6b067b0 block: Move power management code into a new source file
Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
<linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file <linux/blkdev.h>
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:28 -06:00
Greg Kroah-Hartman a38523185b erge tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Dan writes:
  "libnvdimm/dax for 4.19-rc6

  * (2) fixes for the dax error handling updates that were merged for
  v4.19-rc1. My mails to Al have been bouncing recently, so I do not have
  his ack but the uaccess change is of the trivial / obviously correct
  variety. The address_space_operations fixes a regression.

  * A filesystem-dax fix to correct the zero page lookup to be compatible
   with non-x86 (mips and s390) architectures."

* tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: Add missing address_space_operations
  uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
  filesystem-dax: Fix use of zero page
2018-09-25 21:37:41 +02:00
Greg Kroah-Hartman 2dd68cc7fd Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Networking fixes:

  1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
     Abreu.

   2) Fix memory corruption in NFC, from Suren Baghdasaryan.

   3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.

   4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.

   5) Fix TX done race in mvpp2, from Antoine Tenart.

   6) ipv6 metrics leak, from Wei Wang.

   7) Adjust firmware version requirements in mlxsw, from Petr Machata.

   8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.

   9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
      Barnhill.

   10) Fix double free in devlink, from Dan Carpenter.

   11) Fix ethtool regression from UFO feature removal, from Maciej
       Żenczykowski.

   12) Fix drivers that have a ndo_poll_controller() that captures the
       cpu entirely on loaded hosts by trying to drain all rx and tx
       queues, from Eric Dumazet.

   13) Fix memory corruption with jumbo frames in aquantia driver, from
       Friedemann Gerold."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
  net: mvneta: fix the remaining Rx descriptor unmapping issues
  ip_tunnel: be careful when accessing the inner header
  mpls: allow routes on ip6gre devices
  net: aquantia: memory corruption on jumbo frames
  tun: remove ndo_poll_controller
  nfp: remove ndo_poll_controller
  bnxt: remove ndo_poll_controller
  bnx2x: remove ndo_poll_controller
  mlx5: remove ndo_poll_controller
  mlx4: remove ndo_poll_controller
  i40evf: remove ndo_poll_controller
  ice: remove ndo_poll_controller
  igb: remove ndo_poll_controller
  ixgb: remove ndo_poll_controller
  fm10k: remove ndo_poll_controller
  ixgbevf: remove ndo_poll_controller
  ixgbe: remove ndo_poll_controller
  bonding: use netpoll_poll_dev() helper
  netpoll: make ndo_poll_controller() optional
  rds: Fix build regression.
  ...
2018-09-25 11:19:49 +02:00
Christoph Hellwig 65969e5cb2 block: don't include bug.h from bio.h
No need to pull in the BUG() defintion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:34:02 -06:00
Christoph Hellwig bceacbfa48 block: don't include io.h from bio.h
Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is
no need to drag this header in.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:34:00 -06:00
Christoph Hellwig 6e768461c2 block: remove bvec_to_phys
We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there.  Also remove the pointless unsigned long cast
for the offset in the opencoded instances.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:59 -06:00
Christoph Hellwig 3dccdae54f block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable
These two checks should always be performed together, so merge them into
a single helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:57 -06:00
Christoph Hellwig 6a9f5f240a block: simplify BIOVEC_PHYS_MERGEABLE
Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:54 -06:00
Christoph Hellwig 27ca1d4ed0 block: move req_gap_back_merge to blk.h
No need to expose these helpers outside the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:53 -06:00
Christoph Hellwig e9907009cb block: move req_gap_{back,front}_merge to blk-merge.c
Keep it close to the actual users instead of exposing the function to all
drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:51 -06:00
Christoph Hellwig 43b729bfe9 block: move integrity_req_gap_{back,front}_merge to blk.h
No need to expose these to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:50 -06:00
Eric Dumazet ac3d9dd034 netpoll: make ndo_poll_controller() optional
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().

NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.

Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.

Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 21:55:24 -07:00
Greg Kroah-Hartman d02771fb16 Merge tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Lee writes:
  "MFD fixes for v4.19
   - Fix Dialog DA9063 regulator constraints issue causing failure in
     probe
   - Fix OMAP Device Tree compatible strings to match DT"

* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: omap-usb-host: Fix dts probe of children
  mfd: da9063: Fix DT probing with constraints
2018-09-23 17:19:27 +02:00
Greg Kroah-Hartman a83f87c1d2 Merge tag 'for-linus-20180922' of git://git.kernel.dk/linux-block
Jens writes:
  "Just a single fix in this pull request, fixing a regression in
  /proc/diskstats caused by the unification of timestamps."

* tag 'for-linus-20180922' of git://git.kernel.dk/linux-block:
  block: use nanosecond resolution for iostat
2018-09-23 08:33:28 +02:00
Dennis Zhou (Facebook) 101246ec02 blkcg: rename blkg_try_get to blkg_tryget
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:19 -06:00
Dennis Zhou (Facebook) b3b9f24f5f blkcg: change blkg reference counting to use percpu_ref
Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:18 -06:00
Dennis Zhou (Facebook) e2b0989954 blkcg: cleanup and make blk_get_rl use blkg_lookup_create
blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:16 -06:00
Dennis Zhou (Facebook) f0fcb3ec89 blkcg: remove additional reference to the css
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:15 -06:00
Dennis Zhou (Facebook) c839e7a03f blkcg: remove bio->bi_css and instead use bio->bi_blkg
Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:13 -06:00
Dennis Zhou (Facebook) bdc2491708 blkcg: associate writeback bios with a blkg
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:11 -06:00
Dennis Zhou (Facebook) 74b7c02a9b blkcg: associate a blkg for pages being evicted by swap
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:09 -06:00
Dennis Zhou (Facebook) 5bf9a1f3b4 blkcg: consolidate bio_issue_init to be a part of core
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:08 -06:00
Dennis Zhou (Facebook) a7b39b4e96 blkcg: always associate a bio with a blkg
Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:06 -06:00
Dennis Zhou (Facebook) 07b05bcc32 blkcg: convert blkg_lookup_create to find closest blkg
There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:05 -06:00
Dennis Zhou (Facebook) 49f4c2dc2b blkcg: update blkg_lookup_create to do locking
To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:03 -06:00
Dennis Zhou (Facebook) 27e6fa996c blkcg: fix ref count issue with bio_blkcg using task_css
The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:02 -06:00
Omar Sandoval b57e99b4b8 block: use nanosecond resolution for iostat
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.

Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.

Fixes: 522a777566 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche <klaus.kusche@computerix.info>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:26:59 -06:00
Greg Kroah-Hartman a27fb6d983 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Paolo writes:
  "It's mostly small bugfixes and cleanups, mostly around x86 nested
   virtualization.  One important change, not related to nested
   virtualization, is that the ability for the guest kernel to trap
   CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
   now masked by default.  This is because the feature is detected
   through an MSR; a very bad idea that Intel seems to like more and
   more.  Some applications choke if the other fields of that MSR are
   not initialized as on real hardware, hence we have to disable the
   whole MSR by default, as was the case before Linux 4.12."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
  kvm: selftests: Add platform_info_test
  KVM: x86: Control guest reads of MSR_PLATFORM_INFO
  KVM: x86: Turbo bits in MSR_PLATFORM_INFO
  nVMX x86: Check VPID value on vmentry of L2 guests
  nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
  KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
  KVM: VMX: check nested state and CR4.VMXE against SMM
  kvm: x86: make kvm_{load|put}_guest_fpu() static
  x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
  KVM: VMX: use preemption timer to force immediate VMExit
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: immediately mark preemption timer expired only for zero value
  KVM: SVM: Switch to bitmap_zalloc()
  KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
  kvm: selftests: use -pthread instead of -lpthread
  KVM: x86: don't reset root in kvm_mmu_setup()
  kvm: mmu: Don't read PDPTEs when paging is not enabled
  x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  KVM: s390: Make huge pages unavailable in ucontrol VMs
  ...
2018-09-21 16:21:42 +02:00
Boris Brezillon c949a8e8b4 spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
We'd better have that documented in the kerneldoc header, so that it's
exposed to the doc generated by Sphinx.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-09-20 12:23:31 -07:00
Boris Brezillon 60489f0855 spi: spi-mem: Add missing description for data.nbytes field
Add a description for spi_mem_op.data.nbytes to the kerneldoc header.

Fixes: c36ff266dc ("spi: Extend the core to ease integration of SPI memory controllers")
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-09-20 12:23:25 -07:00
Miguel Ojeda ae596de1a0 Compiler Attributes: naked can be shared
The naked attribute is supported by at least gcc >= 4.6 (for ARM,
which is the only current user), gcc >= 8 (for x86), clang >= 3.1
and icc >= 13. See https://godbolt.org/z/350Dyc

Therefore, move it out of compiler-gcc.h so that the definition
is shared by all compilers.

This also fixes Clang support for ARM32 --- 815f0ddb34
("include/linux/compiler*.h: make compiler-*.h mutually exclusive").

Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 15:23:58 +02:00
Miguel Ojeda d124b44f09 Compiler Attributes: naked was fixed in gcc 4.6
Commit 9c695203a7 ("compiler-gcc.h: gcc-4.5 needs noclone
and noinline on __naked functions") added noinline and noclone
as a workaround for a gcc 4.5 bug, which was resolved in 4.6.0.

Since now the minimum gcc supported version is 4.6,
we can clean it up.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
and https://godbolt.org/z/h6NMIL

Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 15:23:58 +02:00
Greg Kroah-Hartman d82920849f Merge tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Takashi writes:
  "sound fixes for 4.19-rc5

   here comes a collection of various fixes, mostly for stable-tree
   or regression fixes.

   Two relatively high LOCs are about the (rather simple) conversion of
   uapi integer types in topology API, and a regression fix about HDMI
   hotplug notification on AMD HD-audio.  The rest are all small
   individual fixes like ASoC Intel Skylake race condition, minor
   uninitialized page leak in emu10k1 ioctl, Firewire audio error paths,
   and so on."

* tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
  ALSA: fireworks: fix memory leak of response buffer at error path
  ALSA: oxfw: fix memory leak of discovered stream formats at error path
  ALSA: oxfw: fix memory leak for model-dependent data at error path
  ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path
  ALSA: hda - Enable runtime PM only for discrete GPU
  ALSA: oxfw: fix memory leak of private data
  ALSA: firewire-tascam: fix memory leak of private data
  ALSA: firewire-digi00x: fix memory leak of private data
  sound: don't call skl_init_chip() to reset intel skl soc
  sound: enable interrupt after dma buffer initialization
  Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
  ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO
  ASoC: cs4265: fix MMTLR Data switch control
  ASoC: AMD: Ensure reset bit is cleared before configuring
  ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()
  ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping
  ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER
  ASoC: rsnd: adg: care clock-frequency size
  ASoC: uniphier: change status to orphan
  ASoC: rsnd: fixup not to call clk_get/set under non-atomic
  ...
2018-09-20 09:50:49 +02:00
Sebastian Andrzej Siewior 822f312d47 kvm: x86: make kvm_{load|put}_guest_fpu() static
The functions
	kvm_load_guest_fpu()
	kvm_put_guest_fpu()

are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb40251a ("KVM: Drop kvm_{load,put}_guest_fpu() exports").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:43 +02:00
Jose Abreu 8fce333170 net: stmmac: Rework coalesce timer and fix multi-queue races
This follows David Miller advice and tries to fix coalesce timer in
multi-queue scenarios.

We are now using per-queue coalesce values and per-queue TX timer.

Coalesce timer default values was changed to 1ms and the coalesce frames
to 25.

Tested in B2B setup between XGMAC2 and GMAC5.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 	ce736788e8 ("net: stmmac: adding multiple buffers for TX")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 19:48:08 -07:00
Linus Torvalds 1abc088afd Merge tag 'usb-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes for -rc4.

  The usual suspects of gadget, xhci, and dwc2/3 are in here, along with
  some reverts of reported problem changes, and a number of build
  documentation warning fixes. Full details are in the shortlog.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
  Revert "cdc-acm: implement put_char() and flush_chars()"
  usb: Change usb_of_get_companion_dev() place to usb/common
  usb: xhci: fix interrupt transfer error happened on MTK platforms
  usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()
  usb: misc: uss720: Fix two sleep-in-atomic-context bugs
  usb: host: u132-hcd: Fix a sleep-in-atomic-context bug in u132_get_frame()
  usb: Avoid use-after-free by flushing endpoints early in usb_set_interface()
  linux/mod_devicetable.h: fix kernel-doc missing notation for typec_device_id
  usb/typec: fix kernel-doc notation warning for typec_match_altmode
  usb: Don't die twice if PCI xhci host is not responding in resume
  usb: mtu3: fix error of xhci port id when enable U3 dual role
  usb: uas: add support for more quirk flags
  USB: Add quirk to support DJI CineSSD
  usb: typec: fix kernel-doc parameter warning
  usb/dwc3/gadget: fix kernel-doc parameter warning
  USB: yurex: Check for truncation in yurex_read()
  USB: yurex: Fix buffer over-read in yurex_write()
  usb: host: xhci-plat: Iterate over parent nodes for finding quirks
  xhci: Fix use after free for URB cancellation on a reallocated endpoint
  USB: add quirk for WORLDE Controller KS49 or Prodipe MIDI 49C USB controller
  ...
2018-09-14 05:59:48 -10:00
Linus Torvalds 48751b562b Merge tag 'ovl-fixes-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This fixes a regression in the recent file stacking update, reported
  and fixed by Amir Goldstein. The fix is fairly trivial, but involves
  adding a fadvise() f_op and the associated churn in the vfs. As
  discussed on -fsdevel, there are other possible uses for this method,
  than allowing proper stacking for overlays.

  And there's one other fix for a syzkaller detected oops"

* tag 'ovl-fixes-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix oopses in ovl_fill_super() failure paths
  ovl: add ovl_fadvise()
  vfs: implement readahead(2) using POSIX_FADV_WILLNEED
  vfs: add the fadvise() file operation
  Documentation/filesystems: update documentation of file_operations
  ovl: fix GPF in swapfile_activate of file from overlayfs over xfs
  ovl: respect FIEMAP_FLAG_SYNC flag
2018-09-13 19:21:40 -10:00
Linus Torvalds 4d8d9f540b Merge tag 'for-linus-20180913' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three fixes that should go into this series. This contains:

   - Increase number of policies supported by blk-cgroup.

     With blk-iolatency, we now have four in kernel, but we had a hard
     limit of three...

   - Fix regression in null_blk, where the zoned supported broke
     queue_mode=0 (bio based).

   - NVMe pull request, with a single fix for an issue in the rdma code"

* tag 'for-linus-20180913' of git://git.kernel.dk/linux-block:
  null_blk: fix zoned support for non-rq based operation
  blk-cgroup: increase number of supported policies
  nvmet-rdma: fix possible bogus dereference under heavy load
2018-09-13 19:16:11 -10:00
Linus Torvalds 7a9cdebdcc mm: get rid of vmacache_flush_all() entirely
Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too.  It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit.  That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.

So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too.  Win-win.

[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
  also just goes away entirely with this ]

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-13 15:18:04 -10:00
Takashi Iwai 37a3a98ef6 ALSA: hda - Enable runtime PM only for discrete GPU
The recent change of vga_switcheroo allowed the runtime PM for
HD-audio on AMD GPUs, but this also resulted in a regression.  When
the HD-audio controller driver gets runtime-suspended, HD-audio link
is turned off, and the hotplug notification is ignored.  This leads to
the inconsistent audio state (the connection isn't notified and ELD is
ignored).

The best fix would be to implement the proper ELD notification via the
audio component, but it's still not ready.  As a quick workaround,
this patch adds the check of runtime_idle and allows the runtime
suspend only when the vga_switcheroo is bound with discrete GPU.
That is, a system with a single GPU and APU would be again without
runtime PM to keep the HD-audio link for the hotplug notification and
ELD read out.

Also, the codec->auto_runtime_pm flag is set only for the discrete GPU
at the time GPU gets bound via vga_switcheroo (i.e. only dGPU is
forcibly runtime-PM enabled), so that APU can still get the ELD
notification.

For identifying which GPU is bound, a new vga_switcheroo client
callback, gpu_bound, is implemented.  The vga_switcheroo simply calls
this when GPU is bound, and tells whether it's dGPU or APU.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200945
Fixes: 07f4f97d7b ("vga_switcheroo: Use device link for HDA controller")
Reported-by: Jian-Hong Pan <jian-hong@endlessm.com>
Tested-by: Jian-Hong Pan <jian-hong@endlessm.com>
Acked-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-09-13 17:58:30 +02:00
Linus Torvalds 54eda9df17 Merge tag 'pci-v4.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:

 - Add Tyrel Datwyler as maintainer for PPC64 RPA hotplug (Tyrel
   Datwyler)

 - Add Gustavo Pimentel as DesignWare PCI maintainer (Joao Pinto)

 - Fix a Switchtec Spectre v1 vulnerability (Gustavo A. R. Silva)

 - Revert an unnecessary Intel 300 ACS quirk (Mika Westerberg)

 - Fix pciehp hot-add/powerfault detection that left indicators in wrong
   state (Keith Busch)

 - Fix pci_reset_bus() logic error (Dennis Dalessandro)

 - Revert IB/hfi1 PCI reset change that caused a deadlock (Dennis
   Dalessandro)

 - Allow enabling PASID on Root Complex Integrated Endpoints (Felix
   Kuehling)

* tag 'pci-v4.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix enabling of PASID on RC integrated endpoints
  IB/hfi1,PCI: Allow bus reset while probing
  PCI: Fix faulty logic in pci_reset_bus()
  PCI: pciehp: Fix hot-add vs powerfault detection order
  switchtec: Fix Spectre v1 vulnerability
  Revert "PCI: Add ACS quirk for Intel 300 series"
  MAINTAINERS: Add Gustavo Pimentel as DesignWare PCI maintainer
  MAINTAINERS: Add entries for PPC64 RPA PCI hotplug drivers
2018-09-12 19:39:56 -10:00