[rk emmc: read rate up to 78MB/s, write rate up to 26MB/s]
For completing any block request, MMC block driver is calling:
spin_lock_irq(queue)
__blk_end_request()
spin_unlock_irq(queue)
But if we analyze the sources of latency in kernel using ftrace,
__blk_end_request() function at times may take up to 6.5ms with
spinlock held and irq disabled.
__blk_end_request() calls couple of functions and ftrace output
shows that blk_update_bidi_request() function is almost taking 6ms.
There are 2 function to end the current request: ___blk_end_request()
and blk_end_request(). Both these functions do same thing except
that blk_end_request() function doesn't take up the spinlock
while calling the blk_update_bidi_request().
This patch replaces all __blk_end_request() calls with
blk_end_request() and __blk_end_request_all() calls with
blk_end_request_all().
Testing done: 20 process concurrent read/write on sd card
and eMMC. Ran this test for almost a day on multicore system
and no errors observed.
This change is not meant for improving MMC throughput; it's basically
about becoming fair to other threads/interrupts in the system. By
holding spin lock and interrupts disabled for longer duration, we
won't allow other threads/interrupts to run at all. Actually slight
performance degradation at file system level can be expected as we
are not holding the spin lock during blk_update_bidi_request() which
means our mmcqd thread may get preempted for other high priority
thread or any interrupt in the system.
These are performance numbers (100MB file write) with eMMC running
in DDR mode:
Without this patch:
Name of the Test Value Unit
LMDD Read Test 53.79 MBPS
LMDD Write Test 18.86 MBPS
IOZONE Read Test 51.65 MBPS
IOZONE Write Test 24.36 MBPS
With this patch:
Name of the Test Value Unit
LMDD Read Test 52.94 MBPS
LMDD Write Test 16.70 MBPS
IOZONE Read Test 52.08 MBPS
IOZONE Write Test 23.29 MBPS
Read numbers are fine. Write numbers are bit down (especially LMDD
write), may be because write requests normally have large transfer
size and which means there are chances that while mmcq is executing
blk_update_bidi_request(), it may get interrupted by interrupts or
other high priority thread.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Conflicts:
drivers/mmc/card/block.c
Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
* The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue.
* The mmc-queue calls issue_rw_rq() again with a new request.
* This new request is prepared in issue_rw_rq(), then it waits for
the active request to complete before pushing it to the host.
* When the mmc-queue is empty it will call issue_rw_rq() with a NULL
req to finish off the active request without starting a new request.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Conflicts:
drivers/mmc/card/block.c
Break out code from mmc_blk_issue_rw_rq to create a block request prepare
function. This doesn't change any functionallity. This helps when handling
more than one active block request.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Conflicts:
drivers/mmc/card/block.c
The way the request data is organized in the mmc queue struct, it only
allows processing of one request at a time. This patch adds a new struct
to hold mmc queue request data such as sg list, request, blk request and
bounce buffers, and updates any functions depending on the mmc queue
struct. This prepares for using multiple active requests in one mmc queue.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Some host controllers will not operate without a hardware
timeout that is limited in value. However large discards
require large timeouts, so there needs to be a way to
specify the maximum discard size.
A host controller driver may now specify the maximum discard
timeout possible so that max_discard_sectors can be calculated.
However, for eMMC when the High Capacity Erase Group Size
is not in use, the timeout calculation depends on clock
rate which may change. For that case Preferred Erase Size
is used instead.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
The synchronize_rcu() call resulting from making every serial driver
wake-up capable (commit b3b708fa) slows boot down on my Tegra2x system
(with CONFIG_PREEMPT disabled).
But this is avoidable since it is the device_set_wakeup_enable() and then
subsequence disable which causes the delay. We might as well just make
the device wakeup capable but not actually enable it for wakeup until
needed.
Effectively the current code does this:
device_set_wakeup_capable(dev, 1);
device_set_wakeup_enable(dev, 1);
device_set_wakeup_enable(dev, 0);
We can just drop the last two lines.
Before this change my boot log says:
[ 0.227062] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.702928] serial8250.0: ttyS0 at MMIO 0x70006040 (irq = 69) is a Tegra
after:
[ 0.227264] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.227983] serial8250.0: ttyS0 at MMIO 0x70006040 (irq = 69) is a Tegra
for saving of 450ms.
Suggested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
mmc: core: add non-blocking mmc request function
Previously there has only been one function mmc_wait_for_req()
to start and wait for a request. This patch adds:
* mmc_start_req() - starts a request wihtout waiting
If there is on ongoing request wait for completion
of that request and start the new one and return.
Does not wait for the new command to complete.
This patch also adds new function members in struct mmc_host_ops
only called from core.c:
* pre_req - asks the host driver to prepare for the next job
* post_req - asks the host driver to clean up after a completed job
The intention is to use pre_req() and post_req() to do cache maintenance
while a request is active. pre_req() can be called while a request is
active to minimize latency to start next job. post_req() can be used after
the next job is started to clean up the request. This will minimize the
host driver request end latency. post_req() is typically used before
ending the block request and handing over the buffer to the block layer.
Add a host-private member in mmc_data to be used by pre_req to mark the
data. The host driver will then check this mark to see if the data is
prepared or not.
eliminate the build error using rk2928_tb_defconfig/rk2928_sdk_defconfig.
as the following:
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
drivers/built-in.o: In function rk29_sdmmc_gpio_open'
make: *** [.tmp_vmlinux1] Error 1