mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-08 03:40:35 +09:00
dde03585afa4eec313d14126c3a34ba39b023acf
575695 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dde03585af |
FROMLIST: ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes. Such information can for example be used to determine whether the servicemanager is the only process holding a reference to a node. This information can then be passed on to the process holding the node, which can in turn decide whether it wants to shut down to reduce resource usage. Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> |
||
|
|
1bf5d59504 |
BACKPORT: arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
commit |
||
|
|
c5613bdc9e |
ANDROID: arm64: mm: fix 4.4.154 merge
android-4.4 contains an out-of-tree backport of
|
||
|
|
00d3adf9f2 |
BACKPORT: zsmalloc: introduce zs_huge_class_size()
Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.
ZRAM's max_zpage_size is a bad thing. It forces zsmalloc to store
normal objects as huge ones, which results in bigger zsmalloc memory
usage. Drop it and use actual zsmalloc huge-class value when decide if
the object is huge or not.
This patch (of 2):
Not every object can be share its zspage with other objects, e.g. when
the object is as big as zspage or nearly as big a zspage. For such
objects zsmalloc has a so called huge class - every object which belongs
to huge class consumes the entire zspage (which consists of a physical
page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
3264, so starting down from size 3264, objects can share page(-s) and
thus minimize memory wastage.
ZRAM, however, has its own statically defined watermark for huge
objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
object larger than this watermark (3072) as a PAGE_SIZE object, in other
words, to a huge class, while zsmalloc can keep some of those objects in
non-huge classes. This results in increased memory consumption.
zsmalloc knows better if the object is huge or not. Introduce
zs_huge_class_size() function which tells if the given object can be
stored in one of non-huge classes or not. This will let us to drop
ZRAM's huge object watermark and fully rely on zsmalloc when we decide
if the object is huge.
[sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
49c3b184dc |
BACKPORT: zram: drop max_zpage_size and use zs_huge_class_size()
Remove ZRAM's enforced "huge object" value and use zsmalloc huge-class
watermark instead, which makes more sense.
TEST
- I used a 1G zram device, LZO compression back-end, original
data set size was 444MB. Looking at zsmalloc classes stats the
test ended up to be pretty fair.
BASE ZRAM/ZSMALLOC
=====================
zram mm_stat
498978816 191482495 199831552 0 199831552 15634 0
zsmalloc classes
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
151 2448 0 0 1240 1240 744 3 0
168 2720 0 0 4200 4200 2800 2 0
190 3072 0 0 10100 10100 7575 3 0
202 3264 0 0 380 380 304 4 0
254 4096 0 0 10620 10620 10620 1 0
Total 7 46 106982 106187 48787 0
PATCHED ZRAM/ZSMALLOC
=====================
zram mm_stat
498978816 182579184 194248704 0 194248704 15628 0
zsmalloc classes
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
151 2448 0 0 1240 1240 744 3 0
168 2720 0 0 4200 4200 2800 2 0
190 3072 0 0 10100 10100 7575 3 0
202 3264 0 0 7180 7180 5744 4 0
254 4096 0 0 3820 3820 3820 1 0
Total 8 45 106959 106193 47424 0
As we can see, we reduced the number of objects stored in class-4096,
because a huge number of objects which we previously forcibly stored in
class-4096 now stored in non-huge class-3264. This results in lower
memory consumption:
- zsmalloc now uses 47424 physical pages, which is less than 48787 pages
zsmalloc used before.
- objects that we store in class-3264 share zspages. That's why overall
the number of pages that both class-4096 and class-3264 consumed went
down from 10924 to 9564.
[sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
Link: http://lkml.kernel.org/r/20180314081833.1096-3-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20180306070639.7389-3-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
4c0933247e |
ANDROID: tracing: fix race condition reading saved tgids
Commit
|
||
|
|
8459a575c9 |
ANDROID: x86_64_cuttlefish_defconfig: Enable lz4 compression for zram
Signed-off-by: Peter Kalauskas <peskal@google.com> Bug: 112488418 Change-Id: Iab302cdf63691a3cc3124b5826206b9f6bd4adfb Signed-off-by: Peter Kalauskas <peskal@google.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> |
||
|
|
4e04ef38ba |
UPSTREAM: drivers/block/zram/zram_drv.c: fix bug storing backing_dev
The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer. Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:
echo /dev/sdX > /sys/block/zram0/backing_dev
The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol. Which also means that "echo -n
/dev/sdX" most likely was broken.
Signed-off-by: Peter Kalauskas <peskal@google.com>
Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
d8ea3525da |
BACKPORT: zram: introduce zram memory tracking
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. Better idea is app developers free them directly rather than
remaining them on heap.
This patch tell us last access time of each block of zram via "cat
/sys/kernel/debug/zram/zram0/block_state".
The output is as follows,
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.
I used the feature a few years ago to find memory hoggers in userspace
to notify them what memory they have wasted without touch for a long
time. With it, they could reduce unnecessary memory space. However, at
that time, I hacked up zram for the feature but now I need the feature
again so I decided it would be better to upstream rather than keeping it
alone. I hope I submit the userspace tool to use the feature soon.
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: use ktime_get_boottime() instead of sched_clock()]
Link: http://lkml.kernel.org/r/20180420063525.GA253739@rodete-desktop-imager.corp.google.com
[akpm@linux-foundation.org: documentation tweak]
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: fix compile warning]
Link: http://lkml.kernel.org/r/20180508104849.GA8209@rodete-desktop-imager.corp.google.com
[rdunlap@infradead.org: fix printk formats]
Link: http://lkml.kernel.org/r/3652ccb1-96ef-0b0b-05d1-f661d7733dcc@infradead.org
Link: http://lkml.kernel.org/r/20180416090946.63057-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
95986f401e |
BACKPORT: zram: record accessed second
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. Better idea is app developers free them directly rather than
remaining them on heap.
This patch records last access time of each block of zram so that With
upcoming zram memory tracking, it could help userspace developers to
reduce memory footprint.
Link: http://lkml.kernel.org/r/20180416090946.63057-4-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
5df312f29d |
BACKPORT: zram: mark incompressible page as ZRAM_HUGE
Mark incompressible pages so that we could investigate who is the owner
of the incompressible pages once the page is swapped out via using
upcoming zram memory tracker feature.
With it, we could prevent such pages to be swapped out by using mlock.
Otherwise we might remove them.
This patch exposes new stat for huge pages via mm_stat.
Link: http://lkml.kernel.org/r/20180416090946.63057-3-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
129dca992b |
UPSTREAM: zram: correct flag name of ZRAM_ACCESS
Patch series "zram memory tracking", v5.
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. As well, it's pointless to store incompressible pages to zram
so better idea is app developers manages them directly like free or
mlock rather than remaining them on heap.
This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.
The output is as follows:
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
This patch (of 4):
ZRAM_ACCESS is used for locking a slot of zram so correct the name. It
is also not a common flag to indicate status of the block so move the
declare position on top of the flag. Lastly, let's move the function to
the top of source code to be able to use it easily without forward
declaration.
Link: http://lkml.kernel.org/r/20180416090946.63057-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ccca79acca |
UPSTREAM: zram: Delete gendisk before cleaning up the request queue
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit
|
||
|
|
0e0c0d4a4c |
UPSTREAM: drivers/block/zram/zram_drv.c: make zram_page_end_io() static
zram_page_end_io() is local to the source and does not need to be in
global scope, so make it static.
Cleans up sparse warning:
symbol 'zram_page_end_io' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
1b5bebae86 |
BACKPORT: zram: set BDI_CAP_STABLE_WRITES once
With fast swap storage, the platform wants to use swap more aggressively and swap-in is crucial to application latency. The rw_page() based synchronous devices like zram, pmem and btt are such fast storage. When I profile swapin performance with zram lz4 decompress test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm. This patchset reduces swap-in latency by skipping swapcache if the swap device is a synchronous device like a rw_page() based device. It enhances by 45% my swapin test (5G sequential swapin, no readahead) from 2.41sec to 1.64sec. This patch (of 4): Commit |
||
|
|
a992749ff2 |
UPSTREAM: zram: fix null dereference of handle
In testing I found handle passed to zs_map_object in __zram_bvec_read is NULL so eh kernel goes oops in pin_object(). The reason is there is no routine to check the slot's freeing after getting the slot's lock. This patch fixes it. [minchan@kernel.org: v2] Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org Fixes: |
||
|
|
1b89745386 |
UPSTREAM: zram: add config and doc file for writeback feature
This patch adds document and kconfig for using of writeback feature.
Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
3249ca4a57 |
BACKPORT: zram: read page from backing device
This patch enables read IO from backing device. For the feature, it
implements two IO read functions to transfer data from backing storage.
One is asynchronous IO function and other is synchronous one.
A reason I need synchrnous IO is due to partial write which need to
complete read IO before the overwriting partial data.
We can make the partial IO's case asynchronous, too but at the moment, I
don't feel adding more complexity to support such rare use cases so want
to go with simple.
[xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
46ec4c3537 |
BACKPORT: zram: write incompressible pages to backing device
This patch enables write IO to transfer data to backing device. For
that, it implements write_to_bdev function which creates new bio and
chaining with parent bio to make the parent bio asynchrnous.
For rw_page which don't have parent bio, it submit owned bio and handle
IO completion by zram_page_end_io.
Also, this patch defines new flag ZRAM_WB to mark written page for later
read IO.
[xieyisheng1@huawei.com: fix typo in comment]
Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ca9fe02221 |
BACKPORT: zram: identify asynchronous IO's return value
For upcoming asynchronous IO like writeback, zram_rw_page should be
aware of that whether requested IO was completed or submitted
successfully, otherwise error.
For the goal, zram_bvec_rw has three return values.
-errno: returns error number
0: IO request is done synchronously
1: IO request is issued successfully.
Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
e75f621362 |
BACKPORT: zram: add free space management in backing device
With backing device, zram needs management of free space of backing
device.
This patch adds bitmap logic to manage free space which is very naive.
However, it would be simple enough as considering uncompressible pages's
frequenty in zram.
Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
3973f69393 |
UPSTREAM: zram: add interface to specif backing device
For writeback feature, user should set up backing device before the zram
working.
This patch enables the interface via /sys/block/zramX/backing_dev.
Currently, it supports block device only but it could be enhanced for
file as well.
Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
1e9f940545 |
UPSTREAM: zram: rename zram_decompress_page to __zram_bvec_read
zram_decompress_page naming is not proper because it doesn't decompress
if page was dedup hit or stored with compression.
Use more abstract term and consistent with write path function
__zram_bvec_write.
Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
eda1379b4d |
UPSTREAM: zram: inline zram_compress
zram_compress does several things, compress, entry alloc and check
limitation. I did for just readbility but it hurts modulization.:(
So this patch removes zram_compress functions and inline it in
__zram_bvec_write for upcoming patches.
Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
50436998d0 |
UPSTREAM: zram: clean up duplicated codes in __zram_bvec_write
Patch series "writeback incompressible pages to storage", v1.
zRam is useful for memory saving with compressible pages but sometime,
workload can be changed and system has lots of incompressible pages
which is very harmful for zram.
This patch supports writeback feature of zram so admin can set up a
block device and with it, zram can save the memory via writing out the
incompressile pages once it found it's incompressible pages (1/4 comp
ratio) instead of keeping the page in memory.
[1-3] is just clean up and [4-8] is step by step feature enablement.
[4-8] is logically not bisectable(ie, logical unit separation)
although I tried to compiled out without breaking but I think it would
be better to review.
This patch (of 9):
__zram_bvec_write has some of duplicated logic for zram meta data
handling of same_page|compressed_page. This patch aims to clean it up
without behavior change.
[xieyisheng1@huawei.com: fix compr_data_size stat]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Juneho Choi <juno.choi@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
e4d5d60b23 |
ANDROID: x86_64_cuttlefish_defconfig: Enable zram and zstd
Signed-off-by: Peter Kalauskas <peskal@google.com> Bug: 112488418 Change-Id: Id53d213c1630f59cb7934309f0da4e9dae6545d8 Signed-off-by: Amit Pundir <amit.pundir@linaro.org> |
||
|
|
e9c01c2e60 |
BACKPORT: crypto: zstd - Add zstd support
Adds zstd support to crypto and scompress. Only supports the default
level.
Previously we held off on this patch, since there weren't any users.
Now zram is ready for zstd support, but depends on CONFIG_CRYPTO_ZSTD,
which isn't defined until this patch is in. I also see a patch adding
zstd to pstore [0], which depends on crypto zstd.
[0] lkml.kernel.org/r/9c9416b2dff19f05fb4c35879aaa83d11ff72c92.1521626182.git.geliangtang@gmail.com
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit
|
||
|
|
f0b5d42975 |
UPSTREAM: zram: add zstd to the supported algorithms list
Add ZSTD to the list of supported compression algorithms.
ZRAM fio perf test:
LZO DEFLATE ZSTD
WRITE: (2180MB/s) (77.2MB/s) (1429MB/s)
WRITE: (1617MB/s) (77.7MB/s) (1202MB/s)
READ: (426MB/s) (595MB/s) (1181MB/s)
READ: (422MB/s) (572MB/s) (1020MB/s)
READ: (318MB/s) (67.8MB/s) (563MB/s)
WRITE: (318MB/s) (67.9MB/s) (564MB/s)
READ: (336MB/s) (68.3MB/s) (583MB/s)
WRITE: (335MB/s) (68.2MB/s) (582MB/s)
WRITE: (3441MB/s) (152MB/s) (2141MB/s)
WRITE: (2507MB/s) (147MB/s) (1888MB/s)
READ: (801MB/s) (1146MB/s) (1890MB/s)
READ: (767MB/s) (1096MB/s) (2073MB/s)
READ: (621MB/s) (126MB/s) (1009MB/s)
WRITE: (621MB/s) (126MB/s) (1009MB/s)
READ: (656MB/s) (125MB/s) (1075MB/s)
WRITE: (657MB/s) (126MB/s) (1077MB/s)
WRITE: (4772MB/s) (225MB/s) (3394MB/s)
WRITE: (3905MB/s) (211MB/s) (2939MB/s)
READ: (1216MB/s) (1608MB/s) (3218MB/s)
READ: (1159MB/s) (1431MB/s) (2981MB/s)
READ: (906MB/s) (156MB/s) (1457MB/s)
WRITE: (907MB/s) (156MB/s) (1458MB/s)
READ: (953MB/s) (158MB/s) (1595MB/s)
WRITE: (952MB/s) (157MB/s) (1593MB/s)
WRITE: (6036MB/s) (265MB/s) (4469MB/s)
WRITE: (5059MB/s) (263MB/s) (3951MB/s)
READ: (1618MB/s) (2066MB/s) (4276MB/s)
READ: (1573MB/s) (1942MB/s) (3830MB/s)
READ: (1202MB/s) (227MB/s) (1971MB/s)
WRITE: (1200MB/s) (227MB/s) (1968MB/s)
READ: (1265MB/s) (226MB/s) (2116MB/s)
WRITE: (1264MB/s) (226MB/s) (2114MB/s)
WRITE: (5339MB/s) (233MB/s) (3781MB/s)
WRITE: (4298MB/s) (234MB/s) (3276MB/s)
READ: (1626MB/s) (2048MB/s) (4081MB/s)
READ: (1567MB/s) (1929MB/s) (3758MB/s)
READ: (1174MB/s) (205MB/s) (1747MB/s)
WRITE: (1173MB/s) (204MB/s) (1746MB/s)
READ: (1214MB/s) (208MB/s) (1890MB/s)
WRITE: (1215MB/s) (208MB/s) (1892MB/s)
WRITE: (5666MB/s) (270MB/s) (4338MB/s)
WRITE: (4828MB/s) (267MB/s) (3772MB/s)
READ: (1803MB/s) (2058MB/s) (4946MB/s)
READ: (1805MB/s) (2156MB/s) (4711MB/s)
READ: (1334MB/s) (235MB/s) (2135MB/s)
WRITE: (1335MB/s) (235MB/s) (2137MB/s)
READ: (1364MB/s) (236MB/s) (2268MB/s)
WRITE: (1365MB/s) (237MB/s) (2270MB/s)
WRITE: (5474MB/s) (270MB/s) (4300MB/s)
WRITE: (4666MB/s) (266MB/s) (3817MB/s)
READ: (2022MB/s) (2319MB/s) (5472MB/s)
READ: (1924MB/s) (2260MB/s) (5031MB/s)
READ: (1369MB/s) (242MB/s) (2153MB/s)
WRITE: (1370MB/s) (242MB/s) (2155MB/s)
READ: (1499MB/s) (246MB/s) (2310MB/s)
WRITE: (1497MB/s) (246MB/s) (2307MB/s)
WRITE: (5558MB/s) (273MB/s) (4439MB/s)
WRITE: (4763MB/s) (271MB/s) (3918MB/s)
READ: (2201MB/s) (2599MB/s) (6062MB/s)
READ: (2105MB/s) (2463MB/s) (5413MB/s)
READ: (1490MB/s) (252MB/s) (2238MB/s)
WRITE: (1488MB/s) (252MB/s) (2236MB/s)
READ: (1566MB/s) (254MB/s) (2434MB/s)
WRITE: (1568MB/s) (254MB/s) (2437MB/s)
WRITE: (5120MB/s) (264MB/s) (4035MB/s)
WRITE: (4531MB/s) (267MB/s) (3740MB/s)
READ: (1940MB/s) (2258MB/s) (4986MB/s)
READ: (2024MB/s) (2387MB/s) (4871MB/s)
READ: (1343MB/s) (246MB/s) (2038MB/s)
WRITE: (1342MB/s) (246MB/s) (2037MB/s)
READ: (1553MB/s) (238MB/s) (2243MB/s)
WRITE: (1552MB/s) (238MB/s) (2242MB/s)
WRITE: (5345MB/s) (271MB/s) (3988MB/s)
WRITE: (4750MB/s) (254MB/s) (3668MB/s)
READ: (1876MB/s) (2363MB/s) (5150MB/s)
READ: (1990MB/s) (2256MB/s) (5080MB/s)
READ: (1355MB/s) (250MB/s) (2019MB/s)
WRITE: (1356MB/s) (251MB/s) (2020MB/s)
READ: (1490MB/s) (252MB/s) (2202MB/s)
WRITE: (1488MB/s) (252MB/s) (2199MB/s)
jobs1 perfstat
instructions 52,065,555,710 ( 0.79) 855,731,114,587 ( 2.64) 54,280,709,944 ( 1.40)
branches 14,020,427,116 ( 725.847) 101,733,449,582 (1074.521) 11,170,591,067 ( 992.869)
branch-misses 22,626,174 ( 0.16%) 274,197,885 ( 0.27%) 25,915,805 ( 0.23%)
jobs2 perfstat
instructions 103,633,110,402 ( 0.75) 1,710,822,100,914 ( 2.59) 107,879,874,104 ( 1.28)
branches 27,931,237,282 ( 679.203) 203,298,267,479 (1037.326) 22,185,350,842 ( 884.427)
branch-misses 46,103,811 ( 0.17%) 533,747,204 ( 0.26%) 49,682,483 ( 0.22%)
jobs3 perfstat
instructions 154,857,283,657 ( 0.76) 2,565,748,974,197 ( 2.57) 161,515,435,813 ( 1.31)
branches 41,759,490,355 ( 670.529) 304,905,605,277 ( 978.765) 33,215,805,907 ( 888.003)
branch-misses 74,263,293 ( 0.18%) 759,746,240 ( 0.25%) 76,841,196 ( 0.23%)
jobs4 perfstat
instructions 206,215,849,076 ( 0.75) 3,420,169,460,897 ( 2.60) 215,003,061,664 ( 1.31)
branches 55,632,141,739 ( 666.501) 406,394,977,433 ( 927.241) 44,214,322,251 ( 883.532)
branch-misses 102,287,788 ( 0.18%) 1,098,617,314 ( 0.27%) 103,891,040 ( 0.23%)
jobs5 perfstat
instructions 258,711,315,588 ( 0.67) 4,275,657,533,244 ( 2.23) 269,332,235,685 ( 1.08)
branches 69,802,821,166 ( 588.823) 507,996,211,252 ( 797.036) 55,450,846,129 ( 735.095)
branch-misses 129,217,214 ( 0.19%) 1,243,284,991 ( 0.24%) 173,512,278 ( 0.31%)
jobs6 perfstat
instructions 312,796,166,008 ( 0.61) 5,133,896,344,660 ( 2.02) 323,658,769,588 ( 1.04)
branches 84,372,488,583 ( 520.541) 610,310,494,402 ( 697.642) 66,683,292,992 ( 693.939)
branch-misses 159,438,978 ( 0.19%) 1,396,368,563 ( 0.23%) 174,406,934 ( 0.26%)
jobs7 perfstat
instructions 363,211,372,930 ( 0.56) 5,988,205,600,879 ( 1.75) 377,824,674,156 ( 0.93)
branches 98,057,013,765 ( 463.117) 711,841,255,974 ( 598.762) 77,879,009,954 ( 600.443)
branch-misses 199,513,153 ( 0.20%) 1,507,651,077 ( 0.21%) 248,203,369 ( 0.32%)
jobs8 perfstat
instructions 413,960,354,615 ( 0.52) 6,842,918,558,378 ( 1.45) 431,938,486,581 ( 0.83)
branches 111,812,574,884 ( 414.224) 813,299,084,518 ( 491.173) 89,062,699,827 ( 517.795)
branch-misses 233,584,845 ( 0.21%) 1,531,593,921 ( 0.19%) 286,818,489 ( 0.32%)
jobs9 perfstat
instructions 465,976,220,300 ( 0.53) 7,698,467,237,372 ( 1.47) 486,352,600,321 ( 0.84)
branches 125,931,456,162 ( 424.063) 915,207,005,715 ( 498.192) 100,370,404,090 ( 517.439)
branch-misses 256,992,445 ( 0.20%) 1,782,809,816 ( 0.19%) 345,239,380 ( 0.34%)
jobs10 perfstat
instructions 517,406,372,715 ( 0.53) 8,553,527,312,900 ( 1.48) 540,732,653,094 ( 0.84)
branches 139,839,780,676 ( 427.732) 1,016,737,699,389 ( 503.172) 111,696,557,638 ( 516.750)
branch-misses 259,595,561 ( 0.19%) 1,952,570,279 ( 0.19%) 357,818,661 ( 0.32%)
seconds elapsed 20.630411534 96.084546565 12.743373571
seconds elapsed 22.292627625 100.984155001 14.407413560
seconds elapsed 22.396016966 110.344880848 14.032201392
seconds elapsed 22.517330949 113.351459170 14.243074935
seconds elapsed 28.548305104 156.515193765 19.159286861
seconds elapsed 30.453538116 164.559937678 19.362492717
seconds elapsed 33.467108086 188.486827481 21.492612173
seconds elapsed 35.617727591 209.602677783 23.256422492
seconds elapsed 42.584239509 243.959902566 28.458540338
seconds elapsed 47.683632526 269.635248851 31.542404137
Over all, ZSTD has slower WRITE, but much faster READ (perhaps
a static compression buffer used during the test helped ZSTD a
lot), which results in faster test results.
Memory consumption (zram mm_stat file):
zram LZO mm_stat
mm_stat (jobs1): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs2): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs3): 2147483648 23068672 33558528 0 33562624 0 0
mm_stat (jobs4): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs5): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs6): 2147483648 23068672 33558528 0 33562624 0 0
mm_stat (jobs7): 2147483648 23068672 33558528 0 33566720 0 0
mm_stat (jobs8): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs9): 2147483648 23068672 33558528 0 33558528 0 0
mm_stat (jobs10): 2147483648 23068672 33558528 0 33562624 0 0
zram DEFLATE mm_stat
mm_stat (jobs1): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs2): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs3): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs4): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs5): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs6): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs7): 2147483648 16252928 25178112 0 25190400 0 0
mm_stat (jobs8): 2147483648 16252928 25178112 0 25190400 0 0
mm_stat (jobs9): 2147483648 16252928 25178112 0 25178112 0 0
mm_stat (jobs10): 2147483648 16252928 25178112 0 25178112 0 0
zram ZSTD mm_stat
mm_stat (jobs1): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs2): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs3): 2147483648 11010048 16781312 0 16785408 0 0
mm_stat (jobs4): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs5): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs6): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs7): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs8): 2147483648 11010048 16781312 0 16781312 0 0
mm_stat (jobs9): 2147483648 11010048 16781312 0 16785408 0 0
mm_stat (jobs10): 2147483648 11010048 16781312 0 16781312 0 0
==================================================================================
Official benchmarks [1]:
Compressor name Ratio Compression Decompress.
zstd 1.1.3 -1 2.877 430 MB/s 1110 MB/s
zlib 1.2.8 -1 2.743 110 MB/s 400 MB/s
brotli 0.5.2 -0 2.708 400 MB/s 430 MB/s
quicklz 1.5.0 -1 2.238 550 MB/s 710 MB/s
lzo1x 2.09 -1 2.108 650 MB/s 830 MB/s
lz4 1.7.5 2.101 720 MB/s 3600 MB/s
snappy 1.1.3 2.091 500 MB/s 1650 MB/s
lzf 3.6 -1 2.077 400 MB/s 860 MB/s
Minchan said:
: I did test with my sample data and compared zstd with deflate. zstd's
: compress ratio is lower a little bit but compression speed is much faster
: 3 times more and decompress speed is too 2 times more. With different
: data, it is different but overall, zstd would be better for speed at the
: cost of a little lower compress ratio(about 5%) so I believe it's worth to
: replace deflate.
[1] https://github.com/facebook/zstd
Link: http://lkml.kernel.org/r/20170912050005.3247-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
e2aeff6578 |
UPSTREAM: lib: Add zstd modules
Add zstd compression and decompression kernel modules.
zstd offers a wide varity of compression speed and quality trade-offs.
It can compress at speeds approaching lz4, and quality approaching lzma.
zstd decompressions at speeds more than twice as fast as zlib, and
decompression speed remains roughly the same across all compression levels.
The code was ported from the upstream zstd source repository. The
`linux/zstd.h` header was modified to match linux kernel style.
The cross-platform and allocation code was stripped out. Instead zstd
requires the caller to pass a preallocated workspace. The source files
were clang-formatted [1] to match the Linux Kernel style as much as
possible. Otherwise, the code was unmodified. We would like to avoid
as much further manual modification to the source code as possible, so it
will be easier to keep the kernel zstd up to date.
I benchmarked zstd compression as a special character device. I ran zstd
and zlib compression at several levels, as well as performing no
compression, which measure the time spent copying the data to kernel space.
Data is passed to the compresser 4096 B at a time. The benchmark file is
located in the upstream zstd source repository under
`contrib/linux-kernel/zstd_compress_test.c` [2].
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
211,988,480 B large. Run the following commands for the benchmark:
sudo modprobe zstd_compress_test
sudo mknod zstd_compress_test c 245 0
sudo cp silesia.tar zstd_compress_test
The time is reported by the time of the userland `cp`.
The MB/s is computed with
1,536,217,008 B / time(buffer size, hash)
which includes the time to copy from userland.
The Adjusted MB/s is computed with
1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
The memory reported is the amount of memory the compressor requests.
| Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
|----------|----------|----------|-------|---------|----------|----------|
| none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
| zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
| zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
| zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
| zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
| zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
| zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
| zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
| zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
| zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
| zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |
I benchmarked zstd decompression using the same method on the same machine.
The benchmark file is located in the upstream zstd repo under
`contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
the amount of memory required to decompress data compressed with the given
compression level. If you know the maximum size of your input, you can
reduce the memory usage of decompression irrespective of the compression
level.
| Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
|----------|----------|---------|---------------|-------------|
| none | 0.025 | 8479.54 | - | - |
| zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
| zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
| zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
| zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
| zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
| zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
| zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
| zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |
Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
`contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
BtrFS and SquashFS patches coming next.
[1] https://clang.llvm.org/docs/ClangFormat.html
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
[3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
[5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
[6] http://llvm.org/docs/LibFuzzer.html
[7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
[8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c
zstd source repository: https://github.com/facebook/zstd
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit
|
||
|
|
34aef16229 |
UPSTREAM: lib: Add xxhash module
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
extremely fast non-cryptographic hash algorithm for checksumming.
The zstd compression and decompression modules added in the next patch
require xxhash. I extracted it out from zstd since it is useful on its
own. I copied the code from the upstream XXHash source repository and
translated it into kernel style. I ran benchmarks and tests in the kernel
and tests in userland.
I benchmarked xxhash as a special character device. I ran in four modes,
no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
hashes on the copied data. I also ran it with four different buffer sizes.
The benchmark file is located in the upstream zstd source repository under
`contrib/linux-kernel/xxhash_test.c` [1].
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
Run the following commands for the benchmark:
modprobe xxhash_test
mknod xxhash_test c 245 0
time cp filesystem.squashfs xxhash_test
The time is reported by the time of the userland `cp`.
The GB/s is computed with
1,536,217,008 B / time(buffer size, hash)
which includes the time to copy from userland.
The Normalized GB/s is computed with
1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
| Buffer Size (B) | Hash | Time (s) | GB/s | Adjusted GB/s |
|-----------------|-------|----------|------|---------------|
| 1024 | none | 0.408 | 3.77 | - |
| 1024 | xxh32 | 0.649 | 2.37 | 6.37 |
| 1024 | xxh64 | 0.542 | 2.83 | 11.46 |
| 1024 | crc32 | 1.290 | 1.19 | 1.74 |
| 4096 | none | 0.380 | 4.04 | - |
| 4096 | xxh32 | 0.645 | 2.38 | 5.79 |
| 4096 | xxh64 | 0.500 | 3.07 | 12.80 |
| 4096 | crc32 | 1.168 | 1.32 | 1.95 |
| 8192 | none | 0.351 | 4.38 | - |
| 8192 | xxh32 | 0.614 | 2.50 | 5.84 |
| 8192 | xxh64 | 0.464 | 3.31 | 13.60 |
| 8192 | crc32 | 1.163 | 1.32 | 1.89 |
| 16384 | none | 0.346 | 4.43 | - |
| 16384 | xxh32 | 0.590 | 2.60 | 6.30 |
| 16384 | xxh64 | 0.466 | 3.30 | 12.80 |
| 16384 | crc32 | 1.183 | 1.30 | 1.84 |
Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
kernel functions. A line in each branch of every function in `xxhash.c`
was commented out to ensure that the test-suite fails. Additionally
tested while testing zstd and with SMHasher [3].
[1] https://phabricator.intern.facebook.com/P57526246
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
[3] https://github.com/aappleby/smhasher
zstd source repository: https://github.com/facebook/zstd
XXHash source repository: https://github.com/cyan4973/xxhash
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit
|
||
|
|
0ef897677b |
UPSTREAM: zram: rework copy of compressor name in comp_algorithm_store()
comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size. Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy(). The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.
Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ca03d6eae2 |
UPSTREAM: zram: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o
File size After adding 'const':
text data bss dec hex filename
8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o
Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
2b90f2dd08 |
UPSTREAM: zram: count same page write as page_stored
Regardless of whether it is same page or not, it's surely write and
stored to zram so we should increase pages_stored stat. Otherwise, user
can see zero value via mm_stats although he writes a lot of pages to
zram.
Link: http://lkml.kernel.org/r/1494834068-27004-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
88befbacdb |
UPSTREAM: zram: reduce load operation in page_same_filled
In page_same_filled function, all elements in the page is compared with
next index value. The current comparison routine compares the (i)th and
(i+1)th values of the page.
In this case, two load operaions occur for each comparison. But if we
store first value of the page stores at 'val' variable and using it to
compare with others, the load opearation is reduced. It reduce load
operation per page by up to 64times.
Link: http://lkml.kernel.org/r/1488428104-7257-1-git-send-email-sangwoo2.park@lge.com
Signed-off-by: Sangwoo Park <sangwoo2.park@lge.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
12306d58fe |
UPSTREAM: zram: use zram_free_page instead of open-coded
The zram_free_page already handles NULL handle case and same page so use
it to reduce error probability. (Acutaully, I made a mistake when I
handled same page feature)
Link: http://lkml.kernel.org/r/1492052365-16169-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
84f48a884b |
UPSTREAM: zram: introduce zram data accessor
With element, sometime I got confused handle and element access. It
might be my bad but I think it's time to introduce accessor to prevent
future idiot like me. This patch is just clean-up patch so it shouldn't
change any behavior.
Link: http://lkml.kernel.org/r/1492052365-16169-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
bb63c75ee7 |
UPSTREAM: zram: remove zram_meta structure
It's redundant now. Instead, remove it and use zram structure directly.
Link: http://lkml.kernel.org/r/1492052365-16169-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
7fa2524c09 |
UPSTREAM: zram: use zram_slot_lock instead of raw bit_spin_lock op
With this clean-up phase, I want to use zram's wrapper function to lock
table access which is more consistent with other zram's functions.
Link: http://lkml.kernel.org/r/1492052365-16169-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ca119b7f8f |
BACKPORT: zram: partial IO refactoring
For architecture(PAGE_SIZE > 4K), zram have supported partial IO.
However, the mixed code for handling normal/partial IO is too mess,
error-prone to modify IO handler functions with upcoming feature so this
patch aims for cleaning up zram's IO handling functions.
Link: http://lkml.kernel.org/r/1492052365-16169-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
49ab60b20f |
BACKPORT: zram: handle multiple pages attached bio's bvec
Patch series "zram clean up", v2.
This patchset aims to clean up zram .
[1] clean up multiple pages's bvec handling.
[2] clean up partial IO handling
[3-6] clean up zram via using accessor and removing pointless structure.
With [2-6] applied, we can get a few hundred bytes as well as huge
readibility enhance.
x86: 708 byte save
add/remove: 1/1 grow/shrink: 0/11 up/down: 478/-1186 (-708)
function old new delta
zram_special_page_read - 478 +478
zram_reset_device 317 314 -3
mem_used_max_store 131 128 -3
compact_store 96 93 -3
mm_stat_show 203 197 -6
zram_add 719 712 -7
zram_slot_free_notify 229 214 -15
zram_make_request 819 803 -16
zram_meta_free 128 111 -17
zram_free_page 180 151 -29
disksize_store 432 361 -71
zram_decompress_page.isra 504 - -504
zram_bvec_rw 2592 2080 -512
Total: Before=25350773, After=25350065, chg -0.00%
ppc64: 231 byte save
add/remove: 2/0 grow/shrink: 1/9 up/down: 681/-912 (-231)
function old new delta
zram_special_page_read - 480 +480
zram_slot_lock - 200 +200
vermagic 39 40 +1
mm_stat_show 256 248 -8
zram_meta_free 200 184 -16
zram_add 944 912 -32
zram_free_page 348 308 -40
disksize_store 572 492 -80
zram_decompress_page 664 564 -100
zram_slot_free_notify 292 160 -132
zram_make_request 1132 1000 -132
zram_bvec_rw 2768 2396 -372
Total: Before=17565825, After=17565594, chg -0.00%
This patch (of 6):
Johannes Thumshirn reported system goes the panic when using NVMe over
Fabrics loopback target with zram.
The reason is zram expects each bvec in bio contains a single page
but nvme can attach a huge bulk of pages attached to the bio's bvec
so that zram's index arithmetic could be wrong so that out-of-bound
access makes system panic.
[1] in mainline solved solved the problem by limiting max_sectors with
SECTORS_PER_PAGE but it makes zram slow because bio should split with
each pages so this patch makes zram aware of multiple pages in a bvec
so it could solve without any regression(ie, bio split).
[1]
|
||
|
|
b200c92fd3 |
UPSTREAM: zram: fix operator precedence to get offset
In zram_rw_page, the logic to get offset is wrong by operator precedence (i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the user's data. This patch fixes it. Fixes: |
||
|
|
7188698948 |
BACKPORT: zram: extend zero pages to same element pages
The idea is that without doing more calculations we extend zero pages to
same element pages for zram. zero page is special case of same element
page with zero element.
1. the test is done under android 7.0
2. startup too many applications circularly
3. sample the zero pages, same pages (none-zero element)
and total pages in function page_zero_filled
the result is listed as below:
ZERO SAME TOTAL
36214 17842 598196
ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME
AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038
STDEV 0.00674612 0.005887625 0.009707034 2.115881328
MAX 0.069698422 0.030046087 0.094975336 7.56043956
MIN 0.03959586 0.007332205 0.056055193 1.928985507
from the above data, the benefit is about 2.5% and up to 3% of total
swapout pages.
The defect of the patch is that when we recovery a page from non-zero
element the operations are low efficient for partial read.
This patch extends zero_page to same_page so if there is any user to
have monitored zero_pages, he will be surprised if the number is
increased but it's not harmful, I believe.
[minchan@kernel.org: do not free same element pages in zram_meta_free]
Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox
Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com
Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org
Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ea830271ca |
BACKPORT: zram: remove waitqueue for IO done
zram_reset_device() waits for ongoing writepage pages to be completed by
zram->refcount logic. However, it's pointless because before the reset,
we prevent further opening of zram by zram->claim and flush all of
pending IO by fsync_bdev so there should be no pending IO at the
zram_reset_device().
So let's remove that code which is even broken due to the lack of
wake_up elsewhere.
Link: http://lkml.kernel.org/r/1485145031-11661-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ff26f4e4a0 |
UPSTREAM: zram: remove obsolete sysfs attrs
We had a deprecated_attr_warn() warning for 2 years and now the time has
come and we finally can do the cleanup.
The plan was as follows:
: per-stat sysfs attributes are considered to be deprecated.
: The basic strategy is:
: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
: -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
:
: The list of deprecated attributes can be found here:
: Documentation/ABI/obsolete/sysfs-block-zram
:
: Basically, every attribute that has its own read accessible sysfs
: node (e.g. num_reads) *AND* is accessible via one of the stat files
: (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered
: to be deprecated.
The patch also removes `obsolete/sysfs-block-zram', clean ups
`testing/sysfs-block-zram' and tweaks zram.txt files.
Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
7032a283ca |
UPSTREAM: zram: support BDI_CAP_STABLE_WRITES
zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true without stable page support. So, If the data is changed under us, zram can make buffer overrun so that zsmalloc free object chain is broken so system goes crash like below https://bugzilla.suse.com/show_bug.cgi?id=997574 This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block device needing *stable write*". Fixes: |
||
|
|
ba6cca5f80 |
UPSTREAM: zram: revalidate disk under init_lock
Commit |
||
|
|
199c3c6ef2 |
BACKPORT: mm: support anonymous stable page
During developemnt for zram-swap asynchronous writeback, I found strange corruption of compressed page, resulting in: Modules linked in: zram(E) CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G E 4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff88007620b840 task.stack: ffff880078090000 RIP: set_freeobj.part.43+0x1c/0x1f RSP: 0018:ffff880078093ca8 EFLAGS: 00010246 RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8 RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246 RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88 R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0 Call Trace: obj_malloc+0x22b/0x260 zs_malloc+0x1e4/0x580 zram_bvec_rw+0x4cd/0x830 [zram] page_requests_rw+0x9c/0x130 [zram] zram_thread+0xe6/0x173 [zram] kthread+0xca/0xe0 ret_from_fork+0x25/0x30 With investigation, it reveals currently stable page doesn't support anonymous page. IOW, reuse_swap_page can reuse the page without waiting writeback completion so it can overwrite page zram is compressing. Unfortunately, zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true unless stable page supports. So, If the data is changed under us, zram can make buffer overrun because second compression size could be bigger than one we got in previous trial and blindly, copy bigger size object to smaller buffer which is buffer overrun. The overrun breaks zsmalloc free object chaining so system goes crash like above. I think below is same problem. https://bugzilla.suse.com/show_bug.cgi?id=997574 Unfortunately, reuse_swap_page should be atomic so that we cannot wait on writeback in there so the approach in this patch is simply return false if we found it needs stable page. Although it increases memory footprint temporarily, it happens rarely and it should be reclaimed easily althoug it happened. Also, It would be better than waiting of IO completion, which is critial path for application latency. Fixes: |
||
|
|
1cf7e4f3e9 |
UPSTREAM: zram: use __GFP_MOVABLE for memory allocation
Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE from
now on.
I did test to see how it helps to make higher order pages. Test
scenario is as follows.
KVM guest, 1G memory, ext4 formated zram block device,
for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done
wait `pidof dd`
for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt
echo "init"
cat /proc/buddyinfo
echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo
old:
init
Node 0, zone DMA 208 120 51 41 11 0 0 0 0 0 0
Node 0, zone DMA32 16380 13777 9184 3805 789 54 3 0 0 0 0
compaction
Node 0, zone DMA 132 82 40 39 16 2 1 0 0 0 0
Node 0, zone DMA32 5219 5526 4969 3455 1831 677 139 15 0 0 0
new:
init
Node 0, zone DMA 379 115 97 19 2 0 0 0 0 0 0
Node 0, zone DMA32 18891 16774 10862 3947 637 21 0 0 0 0 0
compaction
Node 0, zone DMA 214 66 87 29 10 3 0 0 0 0 0
Node 0, zone DMA32 1612 3139 3154 2469 1745 990 384 94 7 0 0
As you can see, compaction made so many high-order pages. Yay!
Link: http://lkml.kernel.org/r/1464736881-24886-13-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
ac99063cdf |
UPSTREAM: zram: drop gfp_t from zcomp_strm_alloc()
We now allocate streams from CPU_UP hot-plug path, there are no
context-dependent stream allocations anymore and we can schedule from
zcomp_strm_alloc(). Use GFP_KERNEL directly and drop a gfp_t parameter.
Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|
||
|
|
4225fb587b |
UPSTREAM: zram: add more compression algorithms
Add "deflate", "lz4hc", "842" algorithms to the list of known
compression backends. The real availability of those algorithms,
however, depends on the corresponding CONFIG_CRYPTO_FOO config options.
[sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
|