Vendor hook may modify index to negative to force an early
exit from idle entry. Add support for the same.
Bug: 192436062
Change-Id: I82b822296d06b122e3f154b2c8af2128136023d5
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
wake_up_all_idle_cpus() will not wakeup paused CPUs since they are removed
from cpu_active_mask but paused CPUs can be in deep cpu idle and hence must
wakeup when uninstalling idle handler.
This change fixes this by introducing wake_up_all_online_idle_cpus() to
unconditionally wakeup all online idle CPUs and invoking same when uninstalling
cpu idle handler.
Bug: 192436062
Fixes: 683010f555 ("ANDROID: cpu/hotplug: add pause/resume_cpus interface")
Change-Id: I4afd4b7a17b87f9cc495e7009c9537888387f9ef
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
In change Iab3971cd0d78d669536b8eb0505c60caa3aafeee the
cfg80211 and mac80211 drivers were switched to modules, so we
need to add them as such to the hikey960_gki.fragment.
With this change, hikey960 boots and wifi comes up.
Bug: 189918667
Fixes: Iab3971cd0d78d669536b8eb0505c60caa3aafeee
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: I8cd3dd3dc76852e270b7b4ba518323af92ff6dda
Remove CONFIG_CFG80211 and CONFIG_MAC80211 from gki_defconfig
to allow vendors to incorporate features that have landed upstream.
Also need to update symbol lists since the related 80211
symbols are no longer exported from the core kernel.
Bug: 189918667
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: Iab3971cd0d78d669536b8eb0505c60caa3aafeee
This reverts commit bba0d8a87e.
CFG80211 changing to a module so these configs go into device-specific
defconfig fragments.
Bug: 189918667
Change-Id: Ie4b70407369da3c865541e4857c3ba18fec24587
This reverts commit 9132fbe545.
Reason for revert: mmap_count is no longer used for reporting dma-bufs
and introduces subtle bugs related to changing the vm_ops
Bug: 192459295
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Id07802e5a3e18918c5c46e31b73be4a594f7dc26
This reverts commit fca37c251a.
Reason for revert: mmap_count is no longer used for reporting dma-bufs and introduces subtle bugs related to changing the vm_ops
Bug: 192459295
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I52fb55e1048a151fae7641c9646a231d59b3224d
Add ABI padding to some of the data structures to accommodate
new eMMC features enablement later.
Bug: 192337957
Change-Id: Ica3f96ea004fb89e4b46ef9734864c655cdcd277
Signed-off-by: Sahitya Tummala <quic_stummala@quicinc.com>
This change adds the rproc_coredump() and rproc_coredump_cleanup()
APIs to the qcom symbol list.
Bug: 188764827
Change-Id: I32a56f5d3caabc61ed94f6de0d7daa29becb490d
Signed-off-by: Siddharth Gupta <quic_sidgup@quicinc.com>
Add the the symbols exported by the remoteproc vendor hook to the
qcom symbol list.
Change-Id: Iffd58aa5d367141de1c065488519b29fb802fd86
Signed-off-by: Siddharth Gupta <quic_sidgup@quicinc.com>
For vendor specific data in struct cfs_rq.
Bug: 188947181
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I7c322c6812829c19014426b5721cd1fb0c37a53f
We need to obtain the pid and tid information of the caller in the async binder transaction.
So we need to add the pid and tid information in the async binder transaction.
Bug: 190413570
Signed-off-by: zhang chuang <zhangchuang3@xiaomi.com>
Change-Id: If67c972aa53196d626ccfeb46b6b61e43ddc57ae
0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe59 ("mm/gup: prevent gup_fast from
racing with COW during fork").
Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.
From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore
struct rw_semaphore {
atomic_long_t count; /* 8 bytes */
atomic_long_t owner; /* 8 bytes */
struct optimistic_spin_queue osq; /* spinner MCS lock */
...
Before commit 57efa1fe59 adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:
"and before the addition of the 'write_protect_seq' field, the
mmap_sem was at offset 120 in 'struct mm_struct'.
Which meant that count and owner were in two different cachelines,
and then when you have contention and spend time in
rwsem_down_write_slowpath(), this is probably *exactly* the kind
of layout you want.
Because first the rwsem_write_trylock() will do a cmpxchg on the
first cacheline (for the optimistic fast-path), and then in the
case of contention, rwsem_down_write_slowpath() will just access
the second cacheline.
Which is probably just optimal for a load that spends a lot of
time contended - new waiters touch that first cacheline, and then
they queue themselves up on the second cacheline."
After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.
Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:
CONFIG_MMU
CONFIG_MEMBARRIER
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.
Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases. For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.
Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 161946584
(cherry picked from commit 2e3025434a)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9142789c5d57d167e5bb1f450d914bf2111894a2
As restricted hooks have been introduced, regular vendor hooks are no
longer necessary.
Bug: 187917024
Change-Id: Ia70e9dd1bd7373e19bdc82e90a2384201076bc0b
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
Enable CONFIG_BLK_CGROUP_IOCOST to help control IO resources.
Bug: 188749221
Change-Id: I611b3ff5929d0a998fa6241967887803636b7588
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: I1feba2334aa34e3bc46eb9d0217118485405beb4
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: Ide8378a898de01a34d8ca3c34472844cd4ffa71c
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: I96b1c690fda172d0c490e944557a674a37620742
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Need symbols for newly added CAN drivers
Bug: 190375772
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: Ibaa1c0963e2e5efb0cf77e6661a683cb00f095d9
We can make use of this commit, to elaborate some more of the host
control mode logic, explaining what role play each and every variable.
While at it, allow those parameters to be configurable.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-13-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: Ib05c6643c69504b8d9442b0024cfe1b0b687a4ce
in host control mode the host is the originator of map requests. To not
flood the device with map requests, use a simple throttling mechanism
that limits the number of inflight map requests.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-10-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I75a5ced3be60569adcd75befa17d8a6340c147fd
In order not to hang on to “cold” regions, we shall inactivate a
region that has no READ access for a predefined amount of time -
READ_TO_MS. For that purpose we shall monitor the active regions list,
polling it on every POLLING_INTERVAL_MS. On timeout expiry we shall add
the region to the "to-be-inactivated" list, unless it is clean and did
not exhaust its READ_TO_EXPIRIES - another parameter.
All this does not apply to pinned regions.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-9-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I2d2efbbc612ccec6ef7036cc1e1d31bd8bfd4174
The spec does not define what is the host's recommended response when
the device send hpb dev reset response (oper 0x2).
We will update all active hpb regions: mark them and do that on the next
read.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-8-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: Ibe87969a4130b4e77f5d163771648679bc5ac7e8
In host mode, the host is expected to send HPB-WRITE-BUFFER with
buffer-id = 0x1 when it inactivates a region.
Use the map-requests pool as there is no point in assigning a
designated cache for umap-requests.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-7-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I1a6696b38d4abfb4d9fbe44e84016a6238825125
In host mode, eviction is considered an extreme measure.
verify that the entering region has enough reads, and the exiting
region has much less reads.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-6-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: Ia08e3af69302c4f0474efa7c616832dde48df4e0
In host control mode, reads are the major source of activation trials.
Keep track of those reads counters, for both active as well inactive
regions.
We reset the read counter upon write - we are only interested in "clean"
reads.
Keep those counters normalized, as we are using those reads as a
comparative score, to make various decisions.
If during consecutive normalizations an active region has exhaust its
reads - inactivate it.
while at it, protect the {active,inactive}_count stats by adding them
into the applicable handler.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-5-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I0541c39e3dd7656ca1816cac3599ab73eb8697a8
Given a transfer length, set_dirty meticulously runs over all the
entries, across subregions and regions if needed. Currently its only use
is to mark dirty blocks, but soon HCM may profit from it as well, when
managing its read counters.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-4-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Change-Id: I916f4bf80490e31e5ef797d67647a41a07cefa02
In device control mode, the device may recommend the host to either
activate or inactivate a region, and the host should follow. Meaning
those are not actually recommendations, but more of instructions.
On the contrary, in host control mode, the recommendation protocol is
slightly changed:
a) The device may only recommend the host to update a subregion of an
already-active region. And,
b) The device may *not* recommend to inactivate a region.
Furthermore, in host control mode, the host may choose not to follow any
of the device's recommendations. However, in case of a recommendation to
update an active and clean subregion, it is better to follow those
recommendation because otherwise the host has no other way to know that
some internal relocation took place.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/lkml/20210607061401.58884-3-avri.altman@wdc.com/
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Change-Id: I02cb053ae4e7fdadd663f9190c95e5f5a79c0e4b
This patch supports the HPB 2.0.
The HPB 2.0 supports read of varying sizes from 4KB to 512KB.
In the case of Read (<= 32KB) is supported as single HPB read.
In the case of Read (36KB ~ 1MB) is supported by as a combination of
write buffer command and HPB read command to deliver more PPN.
The write buffer commands may not be issued immediately due to busy tags.
To use HPB read more aggressively, the driver can requeue the write buffer
command. The requeue threshold is implemented as timeout and can be
modified with requeue_timeout_ms entry in sysfs.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/linux-scsi/20210616070942epcms2p5b858c3ab5a1feca32162c8fd75ebed67@epcms2p5/
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Tested-by: Can Guo <cang@codeaurora.org>
Tested-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Change-Id: I0a54f9ff2c84eed17f77da59331d2400b7edffdc
This is a patch for managing L2P map in HPB module.
The HPB divides logical addresses into several regions. A region consists
of several sub-regions. The sub-region is a basic unit where L2P mapping is
managed. The driver loads L2P mapping data of each sub-region. The loaded
sub-region is called active-state. The HPB driver unloads L2P mapping data
as region unit. The unloaded region is called inactive-state.
Sub-region/region candidates to be loaded and unloaded are delivered from
the UFS device. The UFS device delivers the recommended active sub-region
and inactivate region to the driver using sensedata.
The HPB module performs L2P mapping management on the host through the
delivered information.
A pinned region is a pre-set regions on the UFS device that is always
activate-state.
The data structure for map data request and L2P map uses mempool API,
minimizing allocation overhead while avoiding static allocation.
The mininum size of the memory pool used in the HPB is implemented
as a module parameter, so that it can be configurable by the user.
To gurantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096
The map_work manages active/inactive by 2 "to-do" lists.
Each hpb lun maintains 2 "to-do" lists:
hpb->lh_inact_rgn - regions to be inactivated, and
hpb->lh_act_srgn - subregions to be activated
Those lists are maintained on IO completion.
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/linux-scsi/20210616070848epcms2p2819a1f0bf96cdcc357842fe8500af633@epcms2p2/
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Tested-by: Can Guo <cang@codeaurora.org>
Tested-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Change-Id: I1284f326332e2d6f2c1221e2d64160939614ad2d
This is a patch for the HPB initialization and adds HPB function calls to
UFS core driver.
NAND flash-based storage devices, including UFS, have mechanisms to
translate logical addresses of IO requests to the corresponding physical
addresses of the flash storage.
In UFS, Logical-address-to-Physical-address (L2P) map data, which is
required to identify the physical address for the requested IOs, can only
be partially stored in SRAM from NAND flash. Due to this partial loading,
accessing the flash address area where the L2P information for that address
is not loaded in the SRAM can result in serious performance degradation.
The basic concept of HPB is to cache L2P mapping entries in host system
memory so that both physical block address (PBA) and logical block address
(LBA) can be delivered in HPB read command.
The HPB READ command allows to read data faster than a read command in UFS
since it provides the physical address (HPB Entry) of the desired logical
block in addition to its logical address. The UFS device can access the
physical block in NAND directly without searching and uploading L2P mapping
table. This improves read performance because the NAND read operation for
uploading L2P mapping table is removed.
In HPB initialization, the host checks if the UFS device supports HPB
feature and retrieves related device capabilities. Then, some HPB
parameters are configured in the device.
We measured the total start-up time of popular applications and observed
the difference by enabling the HPB.
Popular applications are 12 game apps and 24 non-game apps. Each target
applications were launched in order. The cycle consists of running 36
applications in sequence. We repeated the cycle for observing performance
improvement by L2P mapping cache hit in HPB.
The Following is experiment environment:
- kernel version: 4.4.0
- RAM: 8GB
- UFS 2.1 (64GB)
Result:
+-------+----------+----------+-------+
| cycle | baseline | with HPB | diff |
+-------+----------+----------+-------+
| 1 | 272.4 | 264.9 | -7.5 |
| 2 | 250.4 | 248.2 | -2.2 |
| 3 | 226.2 | 215.6 | -10.6 |
| 4 | 230.6 | 214.8 | -15.8 |
| 5 | 232.0 | 218.1 | -13.9 |
| 6 | 231.9 | 212.6 | -19.3 |
+-------+----------+----------+-------+
We also measured HPB performance using iozone.
Here is my iozone script:
iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16
-s $IO_RANGE/16 -F mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5
mnt/tmp_6 mnt/tmp_7 mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12
mnt/tmp_13 mnt/tmp_14 mnt/tmp_15 mnt/tmp_16
Result:
+----------+--------+---------+
| IO range | HPB on | HPB off |
+----------+--------+---------+
| 1 GB | 294.8 | 300.87 |
| 4 GB | 293.51 | 179.35 |
| 8 GB | 294.85 | 162.52 |
| 16 GB | 293.45 | 156.26 |
| 32 GB | 277.4 | 153.25 |
+----------+--------+---------+
Bug: 183467926
Bug: 170940265
Bug: 183454255
Link: https://lore.kernel.org/linux-scsi/20210616070812epcms2p4650ce5cd78056dce9162482e59bb74dd@epcms2p4/
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Tested-by: Can Guo <cang@codeaurora.org>
Tested-by: Stanley Chu <stanley.chu@mediatek.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Change-Id: Ib198ff9844fc78c718d1c8e2a98fa13cc7b05f35
While one or more requests with a certain I/O priority are pending, do not
dispatch lower priority requests. Dispatch lower priority requests anyway
after the "aging" time has expired.
This patch has been tested as follows:
modprobe scsi_debug ndelay=1000000 max_queue=16 &&
sd='' &&
while [ -z "$sd" ]; do
sd=/dev/$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done &&
echo $((100*1000)) > /sys/block/$sd/queue/iosched/aging_expire &&
cd /sys/fs/cgroup/blkio/ &&
echo $$ >cgroup.procs &&
echo restrict-to-be >blkio.prio.class &&
mkdir -p hipri &&
cd hipri &&
echo none-to-rt >blkio.prio.class &&
{ max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/low-pri.txt & } &&
echo $$ >cgroup.procs &&
max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/hi-pri.txt
Result:
* 11000 IOPS for the high-priority job
* 40 IOPS for the low-priority job
If the aging expiry time is changed from 100s into 0, the IOPS results change
into 6712 and 6796 IOPS.
The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Change-Id: I99a0674b018d096ec96bbfa3008eedcfda5013da
BUG: 187357408
(cherry picked from commit 40d5d42992b0de3ae7961735ea15eef5bd385ebf git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Maintain statistics per cgroup and export these to user space. These
statistics are essential for verifying whether the proper I/O priorities
have been assigned to requests. An example of the statistics data with
this patch applied:
$ cat /sys/fs/cgroup/io.stat
11:2 rbytes=0 wbytes=0 rios=3 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0
8:32 rbytes=2142720 wbytes=0 rios=105 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8d976c62ba2c0397cbb18076f3e61d5ab246cbcf
(cherry picked from commit f5dc926252cb31739809f7d27a8cbc9941b4d36d git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Track I/O statistics per I/O priority and export these statistics to
debugfs. These statistics help developers of the deadline scheduler.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8e91693dc1d015060737fa2fc15f5f2ebee2530c
(cherry picked from commit 9dc236caf2518c1e434be7a4f8fae60fb0be506a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Maintain one dispatch list and one FIFO list per I/O priority class: RT, BE
and IDLE. Maintain statistics for each priority level. Split the debugfs
attributes per priority level as follows:
$ ls /sys/kernel/debug/block/.../sched/
async_depth dispatch2 read_next_rq write2_fifo_list
batching read0_fifo_list starved write_next_rq
dispatch0 read1_fifo_list write0_fifo_list
dispatch1 read2_fifo_list write1_fifo_list
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I60451cfdb416ad27601dc3ffb4eb307fa6ff783f
(cherry picked from commit 5b701a6e040ff8626ecf29ac06de9689efc00754 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
When dispatching the first request of a batch, the deadline_move_request()
call clears .next_rq[] for the opposite data direction. .next_rq[] is not
restored when changing data direction. Fix this by not clearing .next_rq[]
and by keeping track of the data direction of a batch in a variable instead.
This patch is a micro-optimization because:
- The number of deadline_next_request() calls for the read direction is
halved.
- The number of times that deadline_next_request() returns NULL is reduced.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I582e99603a5443d75cf2b18a5daa2c93b5c66de3
(cherry picked from commit ea0fd2a525436ab5b9ada0f1953b0c0a29357311 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
For interactive workloads it is important that synchronous requests are
not delayed. Hence reserve 25% of scheduler tags for synchronous requests.
This patch still allows asynchronous requests to fill the hardware queues
since blk_mq_init_sched() makes sure that the number of scheduler requests
is the double of the hardware queue depth. From blk_mq_init_sched():
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
BLKDEV_MAX_RQ);
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: Ib9cd753a39c8e5f5c45908001d69334130ef2067
(cherry picked from commit c970bc8292aaaf6f2d333d612e657df3a99f417c git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Define separate macros for integers and jiffies to improve readability.
Use sysfs_emit() and kstrtoint() instead of sprintf() and simple_strtol().
The former functions are the recommended functions.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I4e0fd35124cd0319fcace0d1d5e3c113b60a213c
(cherry picked from commit d9baee13f8cf66a8fac9ec67fdb85ce419fcce3a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Modern compilers complain if an out-of-range value is passed to a function
argument that has an enumeration type. Let the compiler detect out-of-range
data direction arguments instead of verifying the data_dir argument at
runtime.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I4ad8c106a86d17f3010e12e172702e77eca61e80
(cherry picked from commit d9baee13f8cf66a8fac9ec67fdb85ce419fcce3a git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>