Commit Graph

575652 Commits

Author SHA1 Message Date
Sergey Senozhatsky
ff26f4e4a0 UPSTREAM: zram: remove obsolete sysfs attrs
We had a deprecated_attr_warn() warning for 2 years and now the time has
come and we finally can do the cleanup.

The plan was as follows:

: per-stat sysfs attributes are considered to be deprecated.
: The basic strategy is:
: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
: -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
:
: The list of deprecated attributes can be found here:
: Documentation/ABI/obsolete/sysfs-block-zram
:
: Basically, every attribute that has its own read accessible sysfs
: node (e.g. num_reads) *AND* is accessible via one of the stat files
: (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered
: to be deprecated.

The patch also removes `obsolete/sysfs-block-zram', clean ups
`testing/sysfs-block-zram' and tweaks zram.txt files.

Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit c87d1655c2)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Idd86259230d6a4bf0feeee53b5b69f3f3df774d4
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Minchan Kim
7032a283ca UPSTREAM: zram: support BDI_CAP_STABLE_WRITES
zram has used per-cpu stream feature from v4.7.  It aims for increasing
cache hit ratio of scratch buffer for compressing.  Downside of that
approach is that zram should ask memory space for compressed page in
per-cpu context which requires stricted gfp flag which could be failed.
If so, it retries to allocate memory space out of per-cpu context so it
could get memory this time and compress the data again, copies it to the
memory space.

In this scenario, zram assumes the data should never be changed but it is
not true without stable page support.  So, If the data is changed under
us, zram can make buffer overrun so that zsmalloc free object chain is
broken so system goes crash like below

   https://bugzilla.suse.com/show_bug.cgi?id=997574

This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
device needing *stable write*".

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit b09ab054b6)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I3134a5882c1939792ffa71b8f31f7ab642a0e9a3
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Minchan Kim
ba6cca5f80 UPSTREAM: zram: revalidate disk under init_lock
Commit b4c5c60920 ("zram: avoid lockdep splat by revalidate_disk")
moved revalidate_disk call out of init_lock to avoid lockdep
false-positive splat.  However, commit 08eee69fcf ("zram: remove
init_lock in zram_make_request") removed init_lock in IO path so there
is no worry about lockdep splat.  So, let's restore it.

This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
patch.

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit e7ccfc4ccb)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Iebb6f694e46a797f8ce34029857c01c0c71086c7
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Minchan Kim
199c3c6ef2 BACKPORT: mm: support anonymous stable page
During developemnt for zram-swap asynchronous writeback, I found strange
corruption of compressed page, resulting in:

  Modules linked in: zram(E)
  CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  task: ffff88007620b840 task.stack: ffff880078090000
  RIP: set_freeobj.part.43+0x1c/0x1f
  RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
  RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
  RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
  RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
  R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
  Call Trace:
    obj_malloc+0x22b/0x260
    zs_malloc+0x1e4/0x580
    zram_bvec_rw+0x4cd/0x830 [zram]
    page_requests_rw+0x9c/0x130 [zram]
    zram_thread+0xe6/0x173 [zram]
    kthread+0xca/0xe0
    ret_from_fork+0x25/0x30

With investigation, it reveals currently stable page doesn't support
anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
writeback completion so it can overwrite page zram is compressing.

Unfortunately, zram has used per-cpu stream feature from v4.7.
It aims for increasing cache hit ratio of scratch buffer for
compressing. Downside of that approach is that zram should ask
memory space for compressed page in per-cpu context which requires
stricted gfp flag which could be failed. If so, it retries to
allocate memory space out of per-cpu context so it could get memory
this time and compress the data again, copies it to the memory space.

In this scenario, zram assumes the data should never be changed
but it is not true unless stable page supports. So, If the data is
changed under us, zram can make buffer overrun because second
compression size could be bigger than one we got in previous trial
and blindly, copy bigger size object to smaller buffer which is
buffer overrun. The overrun breaks zsmalloc free object chaining
so system goes crash like above.

I think below is same problem.
https://bugzilla.suse.com/show_bug.cgi?id=997574

Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
writeback in there so the approach in this patch is simply return false if
we found it needs stable page.  Although it increases memory footprint
temporarily, it happens rarely and it should be reclaimed easily althoug
it happened.  Also, It would be better than waiting of IO completion,
which is critial path for application latency.

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit f05714293a)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I0fa5012aff9daf614b2d1d04f35b86ff7043ff21
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Minchan Kim
1cf7e4f3e9 UPSTREAM: zram: use __GFP_MOVABLE for memory allocation
Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE from
now on.

I did test to see how it helps to make higher order pages.  Test
scenario is as follows.

KVM guest, 1G memory, ext4 formated zram block device,

  for i in `seq 1 8`;
  do
          dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
  done

  wait `pidof dd`

  for i in `seq 1 2 8`;
  do
          rm -rf mnt/test$i.txt
  done
  fstrim -v mnt

  echo "init"
  cat /proc/buddyinfo

  echo "compaction"
  echo 1 > /proc/sys/vm/compact_memory
  cat /proc/buddyinfo

old:

  init
  Node 0, zone      DMA    208    120     51     41     11      0      0      0      0      0      0
  Node 0, zone    DMA32  16380  13777   9184   3805    789     54      3      0      0      0      0
  compaction
  Node 0, zone      DMA    132     82     40     39     16      2      1      0      0      0      0
  Node 0, zone    DMA32   5219   5526   4969   3455   1831    677    139     15      0      0      0

new:

  init
  Node 0, zone      DMA    379    115     97     19      2      0      0      0      0      0      0
  Node 0, zone    DMA32  18891  16774  10862   3947    637     21      0      0      0      0      0
  compaction
  Node 0, zone      DMA    214     66     87     29     10      3      0      0      0      0      0
  Node 0, zone    DMA32   1612   3139   3154   2469   1745    990    384     94      7      0      0

As you can see, compaction made so many high-order pages. Yay!

Link: http://lkml.kernel.org/r/1464736881-24886-13-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9bc482d346)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I5d7f6eaa4c2d8d3f4da30fc2bd21f4db1be95e50
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
ac99063cdf UPSTREAM: zram: drop gfp_t from zcomp_strm_alloc()
We now allocate streams from CPU_UP hot-plug path, there are no
context-dependent stream allocations anymore and we can schedule from
zcomp_strm_alloc().  Use GFP_KERNEL directly and drop a gfp_t parameter.

Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 16d37725a0)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: If09c4a97f3d3e45ad578d2b1d64b26f65617774d
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
4225fb587b UPSTREAM: zram: add more compression algorithms
Add "deflate", "lz4hc", "842" algorithms to the list of known
compression backends.  The real availability of those algorithms,
however, depends on the corresponding CONFIG_CRYPTO_FOO config options.

[sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
  Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit eb9f56d825)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie46c7676363ef13c559b45dab4968e2cc48a6cbe
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
30a850bbf1 UPSTREAM: zram: delete custom lzo/lz4
Remove lzo/lz4 backends, we use crypto API now.

[sergey.senozhatsky@gmail.com: zram-delete-custom-lzo-lz4-v3]
  Link: http://lkml.kernel.org/r/20160604024902.11778-6-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-7-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit ce1ed9f98e)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ic2aa300a1a66b61740da73833dab252dc0d4b74a
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
feed2aa2a6 UPSTREAM: zram: cosmetic: cleanup documentation
zram documentation is a mix of different styles: spaces, tabs, tabs +
spaces, etc.  Clean it up.

Link: http://lkml.kernel.org/r/20160531122017.2878-6-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 69a30a8d2a)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ib71a1933e3da12b3a9f29b805a458cdc9815c36b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
4e6affffba UPSTREAM: zram: use crypto api to check alg availability
There is no way to get a string with all the crypto comp algorithms
supported by the crypto comp engine, so we need to maintain our own
backends list.  At the same time we additionally need to use
crypto_has_comp() to make sure that the user has requested a compression
algorithm that is recognized by the crypto comp engine.  Relying on
/proc/crypto is not an options here, because it does not show
not-yet-inserted compression modules.

Example:

 modprobe zram
 cat /proc/crypto | grep -i lz4
 modprobe lz4
 cat /proc/crypto | grep -i lz4
name         : lz4
driver       : lz4-generic
module       : lz4

So the user can't tell exactly if the lz4 is really supported from
/proc/crypto output, unless someone or something has loaded it.

This patch also adds crypto_has_comp() to zcomp_available_show().  We
store all the compression algorithms names in zcomp's `backends' array,
regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
are also supported by crypto engine.  This helps user to know the exact
list of compression algorithms that can be used.

Example:
  module lz4 is not loaded yet, but is supported by the crypto
  engine. /proc/crypto has no information on this module, while
  zram's `comp_algorithm' lists it:

 cat /proc/crypto | grep -i lz4

 cat /sys/block/zram0/comp_algorithm
[lzo] lz4 deflate lz4hc 842

We still use the `backends' array to determine if the requested
compression backend is known to crypto api.  This array, however, may not
contain some entries, therefore as the last step we call crypto_has_comp()
function which attempts to insmod the requested compression algorithm to
determine if crypto api supports it.  The advantage of this method is that
now we permit the usage of out-of-tree crypto compression modules
(implementing S/W or H/W compression).

[sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
  Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 415403be37)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I7c823238329bd6e5180386507d16123228804cc5
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
0eeffba830 BACKPORT: zram: switch to crypto compress API
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression.  This
removes possibilities of read paths preempting writes at wrong places
(which could badly affect the performance of both paths) and at the same
time opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.

Joonsoo Kim [1] attempted to do this a while ago, but faced with the
need of introducing a new crypto API interface.  The root cause was the
fact that crypto API compression algorithms require a compression stream
structure (in zram terminology) for both compression and decompression
ops, while in reality only several of compression algorithms really need
it.  This resulted in a concept of context-less crypto API compression
backends [2].  Both write and read paths, though, would have been
executed with the preemption enabled, which in the worst case could have
resulted in a decreased worst-case performance, e.g.  consider the
following case:

	CPU0

	zram_write()
	  spin_lock()
	    take the last idle stream
	  spin_unlock()

	<< preempted >>

		zram_read()
		  spin_lock()
		   no idle streams
			  spin_unlock()
			  schedule()

	resuming zram_write compression()

but it took me some time to realize that, and it took even longer to
evolve zram and to make it ready for crypto API.  The key turned out to be
-- drop the idle streams list entirely.  Without the idle streams list we
are free to use compression algorithms that require compression stream for
decompression (read), because streams are now placed in per-cpu data and
each write path has to disable preemption for compression op, almost
completely eliminating the aforementioned case (technically, we still have
a small chance, because write path has a fast and a slow paths and the
slow path is executed with the preemption enabled; but the frequency of
failed fast path is too low).

TEST
====

- 4 CPUs, x86_64 system
- 3G zram, lzo
- fio tests: read, randread, write, randwrite, rw, randrw

test script [3] command:
 ZRAM_SIZE=3G LOG_SUFFIX=XXXX FIO_LOOPS=5 ./zram-fio-test.sh

                   BASE           PATCHED
jobs1
READ:           2527.2MB/s	 2482.7MB/s
READ:           2102.7MB/s	 2045.0MB/s
WRITE:          1284.3MB/s	 1324.3MB/s
WRITE:          1080.7MB/s	 1101.9MB/s
READ:           430125KB/s	 437498KB/s
WRITE:          430538KB/s	 437919KB/s
READ:           399593KB/s	 403987KB/s
WRITE:          399910KB/s	 404308KB/s
jobs2
READ:           8133.5MB/s	 7854.8MB/s
READ:           7086.6MB/s	 6912.8MB/s
WRITE:          3177.2MB/s	 3298.3MB/s
WRITE:          2810.2MB/s	 2871.4MB/s
READ:           1017.6MB/s	 1023.4MB/s
WRITE:          1018.2MB/s	 1023.1MB/s
READ:           977836KB/s	 984205KB/s
WRITE:          979435KB/s	 985814KB/s
jobs3
READ:           13557MB/s	 13391MB/s
READ:           11876MB/s	 11752MB/s
WRITE:          4641.5MB/s	 4682.1MB/s
WRITE:          4164.9MB/s	 4179.3MB/s
READ:           1453.8MB/s	 1455.1MB/s
WRITE:          1455.1MB/s	 1458.2MB/s
READ:           1387.7MB/s	 1395.7MB/s
WRITE:          1386.1MB/s	 1394.9MB/s
jobs4
READ:           20271MB/s	 20078MB/s
READ:           18033MB/s	 17928MB/s
WRITE:          6176.8MB/s	 6180.5MB/s
WRITE:          5686.3MB/s	 5705.3MB/s
READ:           2009.4MB/s	 2006.7MB/s
WRITE:          2007.5MB/s	 2004.9MB/s
READ:           1929.7MB/s	 1935.6MB/s
WRITE:          1926.8MB/s	 1932.6MB/s
jobs5
READ:           18823MB/s	 19024MB/s
READ:           18968MB/s	 19071MB/s
WRITE:          6191.6MB/s	 6372.1MB/s
WRITE:          5818.7MB/s	 5787.1MB/s
READ:           2011.7MB/s	 1981.3MB/s
WRITE:          2011.4MB/s	 1980.1MB/s
READ:           1949.3MB/s	 1935.7MB/s
WRITE:          1940.4MB/s	 1926.1MB/s
jobs6
READ:           21870MB/s	 21715MB/s
READ:           19957MB/s	 19879MB/s
WRITE:          6528.4MB/s	 6537.6MB/s
WRITE:          6098.9MB/s	 6073.6MB/s
READ:           2048.6MB/s	 2049.9MB/s
WRITE:          2041.7MB/s	 2042.9MB/s
READ:           2013.4MB/s	 1990.4MB/s
WRITE:          2009.4MB/s	 1986.5MB/s
jobs7
READ:           21359MB/s	 21124MB/s
READ:           19746MB/s	 19293MB/s
WRITE:          6660.4MB/s	 6518.8MB/s
WRITE:          6211.6MB/s	 6193.1MB/s
READ:           2089.7MB/s	 2080.6MB/s
WRITE:          2085.8MB/s	 2076.5MB/s
READ:           2041.2MB/s	 2052.5MB/s
WRITE:          2037.5MB/s	 2048.8MB/s
jobs8
READ:           20477MB/s	 19974MB/s
READ:           18922MB/s	 18576MB/s
WRITE:          6851.9MB/s	 6788.3MB/s
WRITE:          6407.7MB/s	 6347.5MB/s
READ:           2134.8MB/s	 2136.1MB/s
WRITE:          2132.8MB/s	 2134.4MB/s
READ:           2074.2MB/s	 2069.6MB/s
WRITE:          2087.3MB/s	 2082.4MB/s
jobs9
READ:           19797MB/s	 19994MB/s
READ:           18806MB/s	 18581MB/s
WRITE:          6878.7MB/s	 6822.7MB/s
WRITE:          6456.8MB/s	 6447.2MB/s
READ:           2141.1MB/s	 2154.7MB/s
WRITE:          2144.4MB/s	 2157.3MB/s
READ:           2084.1MB/s	 2085.1MB/s
WRITE:          2091.5MB/s	 2092.5MB/s
jobs10
READ:           19794MB/s	 19784MB/s
READ:           18794MB/s	 18745MB/s
WRITE:          6984.4MB/s	 6676.3MB/s
WRITE:          6532.3MB/s	 6342.7MB/s
READ:           2150.6MB/s	 2155.4MB/s
WRITE:          2156.8MB/s	 2161.5MB/s
READ:           2106.4MB/s	 2095.6MB/s
WRITE:          2109.7MB/s	 2098.4MB/s

                                    BASE                       PATCHED
jobs1                              perfstat
stalled-cycles-frontend     102,480,595,419 (  41.53%)	  114,508,864,804 (  46.92%)
stalled-cycles-backend       51,941,417,832 (  21.05%)	   46,836,112,388 (  19.19%)
instructions                283,612,054,215 (    1.15)	  283,918,134,959 (    1.16)
branches                     56,372,560,385 ( 724.923)	   56,449,814,753 ( 733.766)
branch-misses                   374,826,000 (   0.66%)	      326,935,859 (   0.58%)
jobs2                              perfstat
stalled-cycles-frontend     155,142,745,777 (  40.99%)	  164,170,979,198 (  43.82%)
stalled-cycles-backend       70,813,866,387 (  18.71%)	   66,456,858,165 (  17.74%)
instructions                463,436,648,173 (    1.22)	  464,221,890,191 (    1.24)
branches                     91,088,733,902 ( 760.088)	   91,278,144,546 ( 769.133)
branch-misses                   504,460,363 (   0.55%)	      394,033,842 (   0.43%)
jobs3                              perfstat
stalled-cycles-frontend     201,300,397,212 (  39.84%)	  223,969,902,257 (  44.44%)
stalled-cycles-backend       87,712,593,974 (  17.36%)	   81,618,888,712 (  16.19%)
instructions                642,869,545,023 (    1.27)	  644,677,354,132 (    1.28)
branches                    125,724,560,594 ( 690.682)	  126,133,159,521 ( 694.542)
branch-misses                   527,941,798 (   0.42%)	      444,782,220 (   0.35%)
jobs4                              perfstat
stalled-cycles-frontend     246,701,197,429 (  38.12%)	  280,076,030,886 (  43.29%)
stalled-cycles-backend      119,050,341,112 (  18.40%)	  110,955,641,671 (  17.15%)
instructions                822,716,962,127 (    1.27)	  825,536,969,320 (    1.28)
branches                    160,590,028,545 ( 688.614)	  161,152,996,915 ( 691.068)
branch-misses                   650,295,287 (   0.40%)	      550,229,113 (   0.34%)
jobs5                              perfstat
stalled-cycles-frontend     298,958,462,516 (  38.30%)	  344,852,200,358 (  44.16%)
stalled-cycles-backend      137,558,742,122 (  17.62%)	  129,465,067,102 (  16.58%)
instructions              1,005,714,688,752 (    1.29)	1,007,657,999,432 (    1.29)
branches                    195,988,773,962 ( 697.730)	  196,446,873,984 ( 700.319)
branch-misses                   695,818,940 (   0.36%)	      624,823,263 (   0.32%)
jobs6                              perfstat
stalled-cycles-frontend     334,497,602,856 (  36.71%)	  387,590,419,779 (  42.38%)
stalled-cycles-backend      163,539,365,335 (  17.95%)	  152,640,193,639 (  16.69%)
instructions              1,184,738,177,851 (    1.30)	1,187,396,281,677 (    1.30)
branches                    230,592,915,640 ( 702.902)	  231,253,802,882 ( 702.356)
branch-misses                   747,934,786 (   0.32%)	      643,902,424 (   0.28%)
jobs7                              perfstat
stalled-cycles-frontend     396,724,684,187 (  37.71%)	  460,705,858,952 (  43.84%)
stalled-cycles-backend      188,096,616,496 (  17.88%)	  175,785,787,036 (  16.73%)
instructions              1,364,041,136,608 (    1.30)	1,366,689,075,112 (    1.30)
branches                    265,253,096,936 ( 700.078)	  265,890,524,883 ( 702.839)
branch-misses                   784,991,589 (   0.30%)	      729,196,689 (   0.27%)
jobs8                              perfstat
stalled-cycles-frontend     440,248,299,870 (  36.92%)	  509,554,793,816 (  42.46%)
stalled-cycles-backend      222,575,930,616 (  18.67%)	  213,401,248,432 (  17.78%)
instructions              1,542,262,045,114 (    1.29)	1,545,233,932,257 (    1.29)
branches                    299,775,178,439 ( 697.666)	  300,528,458,505 ( 694.769)
branch-misses                   847,496,084 (   0.28%)	      748,794,308 (   0.25%)
jobs9                              perfstat
stalled-cycles-frontend     506,269,882,480 (  37.86%)	  592,798,032,820 (  44.43%)
stalled-cycles-backend      253,192,498,861 (  18.93%)	  233,727,666,185 (  17.52%)
instructions              1,721,985,080,913 (    1.29)	1,724,666,236,005 (    1.29)
branches                    334,517,360,255 ( 694.134)	  335,199,758,164 ( 697.131)
branch-misses                   873,496,730 (   0.26%)	      815,379,236 (   0.24%)
jobs10                             perfstat
stalled-cycles-frontend     549,063,363,749 (  37.18%)	  651,302,376,662 (  43.61%)
stalled-cycles-backend      281,680,986,810 (  19.07%)	  277,005,235,582 (  18.55%)
instructions              1,901,859,271,180 (    1.29)	1,906,311,064,230 (    1.28)
branches                    369,398,536,153 ( 694.004)	  370,527,696,358 ( 688.409)
branch-misses                   967,929,335 (   0.26%)	      890,125,056 (   0.24%)

                            BASE           PATCHED
seconds elapsed        79.421641008	78.735285546
seconds elapsed        61.471246133	60.869085949
seconds elapsed        62.317058173	62.224188495
seconds elapsed        60.030739363	60.081102518
seconds elapsed        74.070398362	74.317582865
seconds elapsed        84.985953007	85.414364176
seconds elapsed        97.724553255	98.173311344
seconds elapsed        109.488066758	110.268399318
seconds elapsed        122.768189405	122.967164498
seconds elapsed        135.130035105	136.934770801

On my other system (8 x86_64 CPUs, short version of test results):

                            BASE           PATCHED
seconds elapsed        19.518065994	19.806320662
seconds elapsed        15.172772749	15.594718291
seconds elapsed        13.820925970	13.821708564
seconds elapsed        13.293097816	14.585206405
seconds elapsed        16.207284118	16.064431606
seconds elapsed        17.958376158	17.771825767
seconds elapsed        19.478009164	19.602961508
seconds elapsed        21.347152811	21.352318709
seconds elapsed        24.478121126	24.171088735
seconds elapsed        26.865057442	26.767327618

So performance-wise the numbers are quite similar.

Also update zcomp interface to be more aligned with the crypto API.

[1] http://marc.info/?l=linux-kernel&m=144480832108927&w=2
[2] http://marc.info/?l=linux-kernel&m=145379613507518&w=2
[3] https://github.com/sergey-senozhatsky/zram-perf-test

Link: http://lkml.kernel.org/r/20160531122017.2878-3-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit ebaf9ab56d)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ia0c362b7419de59e6c6ea81c37f99ef1d22c2b4b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
a54133ad2e UPSTREAM: zram: rename zstrm find-release functions
This has started as a 'add zlib support' work, but after some thinking I
saw no blockers for a bigger change -- a switch to crypto API.

We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression.  This
removes possibilities of read paths preempting writes at wrong places
and opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.

This patch set also eliminates the need of a new context-less crypto API
interface, which was quite hard to sell, so we can move along faster.

benchmarks:

(x86_64, 4GB, zram-perf script)

perf reported run-time fio (max jobs=3).  I performed fio test with the
increasing number of parallel jobs (max to 3) on a 3G zram device, using
`static' data and the following crypto comp algorithms:

	842, deflate, lz4, lz4hc, lzo

the output was:

 - test running time (which can tell us what algorithms performs faster)

and

 - zram mm_stat (which tells the compressed memory size, max used memory, etc).

It's just for information.  for example, LZ4HC has twice the running
time of LZO, but the compressed memory size is: 23592960 vs 34603008
bytes.

  test-fio-zram-842
     197.907655282 seconds time elapsed
     201.623142884 seconds time elapsed
     226.854291345 seconds time elapsed
  test-fio-zram-DEFLATE
     253.259516155 seconds time elapsed
     258.148563401 seconds time elapsed
     290.251909365 seconds time elapsed
  test-fio-zram-LZ4
      27.022598717 seconds time elapsed
      29.580522717 seconds time elapsed
      33.293463430 seconds time elapsed
  test-fio-zram-LZ4HC
      56.393954615 seconds time elapsed
      74.904659747 seconds time elapsed
     101.940998564 seconds time elapsed
  test-fio-zram-LZO
      28.155948075 seconds time elapsed
      30.390036330 seconds time elapsed
      34.455773159 seconds time elapsed

zram mm_stat-s (max fio jobs=3)

  test-fio-zram-842
  mm_stat (jobs1): 3221225472 673185792 690266112        0 690266112        0        0
  mm_stat (jobs2): 3221225472 673185792 690266112        0 690266112        0        0
  mm_stat (jobs3): 3221225472 673185792 690266112        0 690266112        0        0
  test-fio-zram-DEFLATE
  mm_stat (jobs1): 3221225472  24379392  37761024        0  37761024        0        0
  mm_stat (jobs2): 3221225472  24379392  37761024        0  37761024        0        0
  mm_stat (jobs3): 3221225472  24379392  37761024        0  37761024        0        0
  test-fio-zram-LZ4
  mm_stat (jobs1): 3221225472  23592960  37761024        0  37761024        0        0
  mm_stat (jobs2): 3221225472  23592960  37761024        0  37761024        0        0
  mm_stat (jobs3): 3221225472  23592960  37761024        0  37761024        0        0
  test-fio-zram-LZ4HC
  mm_stat (jobs1): 3221225472  23592960  37761024        0  37761024        0        0
  mm_stat (jobs2): 3221225472  23592960  37761024        0  37761024        0        0
  mm_stat (jobs3): 3221225472  23592960  37761024        0  37761024        0        0
  test-fio-zram-LZO
  mm_stat (jobs1): 3221225472  34603008  50335744        0  50335744        0        0
  mm_stat (jobs2): 3221225472  34603008  50335744        0  50335744        0        0
  mm_stat (jobs3): 3221225472  34603008  50335744        0  50339840        0        0

This patch (of 8):

We don't perform any zstream idle list lookup anymore, so
zcomp_strm_find()/zcomp_strm_release() names are not representative.

Rename to zcomp_stream_get()/zcomp_stream_put().

Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 2aea8493d3)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I2f4c9e215bca73ba5adb1354aaec6e32e420920d
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
9bf02241b0 UPSTREAM: zram: introduce per-device debug_stat sysfs node
debug_stat sysfs is read-only and represents various debugging data that
zram developers may need.  This file is not meant to be used by anyone
else: its content is not documented and will change any time w/o any
notice.  Therefore, the output of debug_stat file contains a version
string.  To avoid any confusion, we will increase the version number
every time we modify the output.

At the moment this file exports only one value -- the number of
re-compressions, IOW, the number of times compression fast path has
failed.  This stat is temporary any will be useful in case if any
per-cpu compression streams regressions will be reported.

Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox
Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 623e47fc64)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: Ie0ef61db7aa0b2c713de1d8bf48e8a545b4276e9
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
6cbf231390 UPSTREAM: zram: remove max_comp_streams internals
Remove the internal part of max_comp_streams interface, since we
switched to per-cpu streams.  We will keep RW max_comp_streams attr
around, because:

a) we may (silently) switch back to idle compression streams list and
   don't want to disturb user space

b) max_comp_streams attr must wait for the next 'lay off cycle'; we
   give user space 2 years to adjust before we remove/downgrade the attr,
   and there are already several attrs scheduled for removal in 4.11, so
   it's too late for max_comp_streams.

This slightly change a user visible behaviour:

- First, reading from max_comp_stream file now will always return the
  number of online CPUs.

- Second, writing to max_comp_stream will not take any effect.

Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 43209ea2d1)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I1902e741b4d3b83c5bd0d66bf1bae021dbfe2056
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
c331c792c0 UPSTREAM: zram: user per-cpu compression streams
Remove idle streams list and keep compression streams in per-cpu data.
This removes two contented spin_lock()/spin_unlock() calls from write
path and also prevent write OP from being preempted while holding the
compression stream, which can cause slow downs.

For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
with the write requests:

  TASK1            TASK2              TASK3
 zram_bvec_write()
  spin_lock
  find stream
  spin_unlock

  compress

  <<preempted>>   zram_bvec_write()
                   spin_lock
                   find stream
                   spin_unlock
                     no_stream
                       schedule
                                     zram_bvec_write()
                                      spin_lock
                                      find_stream
                                      spin_unlock
                                        no_stream
                                          schedule
   spin_lock
   release stream
   spin_unlock
     wake up TASK2

not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer it to
finish compression and release the stream.

Test environment: x86_64, 4 CPU box, 3G zram, lzo

The following fio tests were executed:
      read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.

                  4 streams        8 streams       per-cpu
  ===========================================================
  jobs1
  READ:           2520.1MB/s       2566.5MB/s      2491.5MB/s
  READ:           2102.7MB/s       2104.2MB/s      2091.3MB/s
  WRITE:          1355.1MB/s       1320.2MB/s      1378.9MB/s
  WRITE:          1103.5MB/s       1097.2MB/s      1122.5MB/s
  READ:           434013KB/s       435153KB/s      439961KB/s
  WRITE:          433969KB/s       435109KB/s      439917KB/s
  READ:           403166KB/s       405139KB/s      403373KB/s
  WRITE:          403223KB/s       405197KB/s      403430KB/s
  jobs2
  READ:           7958.6MB/s       8105.6MB/s      8073.7MB/s
  READ:           6864.9MB/s       6989.8MB/s      7021.8MB/s
  WRITE:          2438.1MB/s       2346.9MB/s      3400.2MB/s
  WRITE:          1994.2MB/s       1990.3MB/s      2941.2MB/s
  READ:           981504KB/s       973906KB/s      1018.8MB/s
  WRITE:          981659KB/s       974060KB/s      1018.1MB/s
  READ:           937021KB/s       938976KB/s      987250KB/s
  WRITE:          934878KB/s       936830KB/s      984993KB/s
  jobs3
  READ:           13280MB/s        13553MB/s       13553MB/s
  READ:           11534MB/s        11785MB/s       11755MB/s
  WRITE:          3456.9MB/s       3469.9MB/s      4810.3MB/s
  WRITE:          3029.6MB/s       3031.6MB/s      4264.8MB/s
  READ:           1363.8MB/s       1362.6MB/s      1448.9MB/s
  WRITE:          1361.9MB/s       1360.7MB/s      1446.9MB/s
  READ:           1309.4MB/s       1310.6MB/s      1397.5MB/s
  WRITE:          1307.4MB/s       1308.5MB/s      1395.3MB/s
  jobs4
  READ:           20244MB/s        20177MB/s       20344MB/s
  READ:           17886MB/s        17913MB/s       17835MB/s
  WRITE:          4071.6MB/s       4046.1MB/s      6370.2MB/s
  WRITE:          3608.9MB/s       3576.3MB/s      5785.4MB/s
  READ:           1824.3MB/s       1821.6MB/s      1997.5MB/s
  WRITE:          1819.8MB/s       1817.4MB/s      1992.5MB/s
  READ:           1765.7MB/s       1768.3MB/s      1937.3MB/s
  WRITE:          1767.5MB/s       1769.1MB/s      1939.2MB/s
  jobs5
  READ:           18663MB/s        18986MB/s       18823MB/s
  READ:           16659MB/s        16605MB/s       16954MB/s
  WRITE:          3912.4MB/s       3888.7MB/s      6126.9MB/s
  WRITE:          3506.4MB/s       3442.5MB/s      5519.3MB/s
  READ:           1798.2MB/s       1746.5MB/s      1935.8MB/s
  WRITE:          1792.7MB/s       1740.7MB/s      1929.1MB/s
  READ:           1727.6MB/s       1658.2MB/s      1917.3MB/s
  WRITE:          1726.5MB/s       1657.2MB/s      1916.6MB/s
  jobs6
  READ:           21017MB/s        20922MB/s       21162MB/s
  READ:           19022MB/s        19140MB/s       18770MB/s
  WRITE:          3968.2MB/s       4037.7MB/s      6620.8MB/s
  WRITE:          3643.5MB/s       3590.2MB/s      6027.5MB/s
  READ:           1871.8MB/s       1880.5MB/s      2049.9MB/s
  WRITE:          1867.8MB/s       1877.2MB/s      2046.2MB/s
  READ:           1755.8MB/s       1710.3MB/s      1964.7MB/s
  WRITE:          1750.5MB/s       1705.9MB/s      1958.8MB/s
  jobs7
  READ:           21103MB/s        20677MB/s       21482MB/s
  READ:           18522MB/s        18379MB/s       19443MB/s
  WRITE:          4022.5MB/s       4067.4MB/s      6755.9MB/s
  WRITE:          3691.7MB/s       3695.5MB/s      5925.6MB/s
  READ:           1841.5MB/s       1933.9MB/s      2090.5MB/s
  WRITE:          1842.7MB/s       1935.3MB/s      2091.9MB/s
  READ:           1832.4MB/s       1856.4MB/s      1971.5MB/s
  WRITE:          1822.3MB/s       1846.2MB/s      1960.6MB/s
  jobs8
  READ:           20463MB/s        20194MB/s       20862MB/s
  READ:           18178MB/s        17978MB/s       18299MB/s
  WRITE:          4085.9MB/s       4060.2MB/s      7023.8MB/s
  WRITE:          3776.3MB/s       3737.9MB/s      6278.2MB/s
  READ:           1957.6MB/s       1944.4MB/s      2109.5MB/s
  WRITE:          1959.2MB/s       1946.2MB/s      2111.4MB/s
  READ:           1900.6MB/s       1885.7MB/s      2082.1MB/s
  WRITE:          1896.2MB/s       1881.4MB/s      2078.3MB/s
  jobs9
  READ:           19692MB/s        19734MB/s       19334MB/s
  READ:           17678MB/s        18249MB/s       17666MB/s
  WRITE:          4004.7MB/s       4064.8MB/s      6990.7MB/s
  WRITE:          3724.7MB/s       3772.1MB/s      6193.6MB/s
  READ:           1953.7MB/s       1967.3MB/s      2105.6MB/s
  WRITE:          1953.4MB/s       1966.7MB/s      2104.1MB/s
  READ:           1860.4MB/s       1897.4MB/s      2068.5MB/s
  WRITE:          1858.9MB/s       1895.9MB/s      2066.8MB/s
  jobs10
  READ:           19730MB/s        19579MB/s       19492MB/s
  READ:           18028MB/s        18018MB/s       18221MB/s
  WRITE:          4027.3MB/s       4090.6MB/s      7020.1MB/s
  WRITE:          3810.5MB/s       3846.8MB/s      6426.8MB/s
  READ:           1956.1MB/s       1994.6MB/s      2145.2MB/s
  WRITE:          1955.9MB/s       1993.5MB/s      2144.8MB/s
  READ:           1852.8MB/s       1911.6MB/s      2075.8MB/s
  WRITE:          1855.7MB/s       1914.6MB/s      2078.1MB/s

perf stat

                                  4 streams                       8 streams                       per-cpu
  ====================================================================================================================
  jobs1
  stalled-cycles-frontend      23,174,811,209 (  38.21%)     23,220,254,188 (  38.25%)       23,061,406,918 (  38.34%)
  stalled-cycles-backend       11,514,174,638 (  18.98%)     11,696,722,657 (  19.27%)       11,370,852,810 (  18.90%)
  instructions                 73,925,005,782 (    1.22)     73,903,177,632 (    1.22)       73,507,201,037 (    1.22)
  branches                     14,455,124,835 ( 756.063)     14,455,184,779 ( 755.281)       14,378,599,509 ( 758.546)
  branch-misses                    69,801,336 (   0.48%)         80,225,529 (   0.55%)           72,044,726 (   0.50%)
  jobs2
  stalled-cycles-frontend      49,912,741,782 (  46.11%)     50,101,189,290 (  45.95%)       32,874,195,633 (  35.11%)
  stalled-cycles-backend       27,080,366,230 (  25.02%)     27,949,970,232 (  25.63%)       16,461,222,706 (  17.58%)
  instructions                122,831,629,690 (    1.13)    122,919,846,419 (    1.13)      121,924,786,775 (    1.30)
  branches                     23,725,889,239 ( 692.663)     23,733,547,140 ( 688.062)       23,553,950,311 ( 794.794)
  branch-misses                    90,733,041 (   0.38%)         96,320,895 (   0.41%)           84,561,092 (   0.36%)
  jobs3
  stalled-cycles-frontend      66,437,834,608 (  45.58%)     63,534,923,344 (  43.69%)       42,101,478,505 (  33.19%)
  stalled-cycles-backend       34,940,799,661 (  23.97%)     34,774,043,148 (  23.91%)       21,163,324,388 (  16.68%)
  instructions                171,692,121,862 (    1.18)    171,775,373,044 (    1.18)      170,353,542,261 (    1.34)
  branches                     32,968,962,622 ( 628.723)     32,987,739,894 ( 630.512)       32,729,463,918 ( 717.027)
  branch-misses                   111,522,732 (   0.34%)        110,472,894 (   0.33%)           99,791,291 (   0.30%)
  jobs4
  stalled-cycles-frontend      98,741,701,675 (  49.72%)     94,797,349,965 (  47.59%)       54,535,655,381 (  33.53%)
  stalled-cycles-backend       54,642,609,615 (  27.51%)     55,233,554,408 (  27.73%)       27,882,323,541 (  17.14%)
  instructions                220,884,807,851 (    1.11)    220,930,887,273 (    1.11)      218,926,845,851 (    1.35)
  branches                     42,354,518,180 ( 592.105)     42,362,770,587 ( 590.452)       41,955,552,870 ( 716.154)
  branch-misses                   138,093,449 (   0.33%)        131,295,286 (   0.31%)          121,794,771 (   0.29%)
  jobs5
  stalled-cycles-frontend     116,219,747,212 (  48.14%)    110,310,397,012 (  46.29%)       66,373,082,723 (  33.70%)
  stalled-cycles-backend       66,325,434,776 (  27.48%)     64,157,087,914 (  26.92%)       32,999,097,299 (  16.76%)
  instructions                270,615,008,466 (    1.12)    270,546,409,525 (    1.14)      268,439,910,948 (    1.36)
  branches                     51,834,046,557 ( 599.108)     51,811,867,722 ( 608.883)       51,412,576,077 ( 729.213)
  branch-misses                   158,197,086 (   0.31%)        142,639,805 (   0.28%)          133,425,455 (   0.26%)
  jobs6
  stalled-cycles-frontend     138,009,414,492 (  48.23%)    139,063,571,254 (  48.80%)       75,278,568,278 (  32.80%)
  stalled-cycles-backend       79,211,949,650 (  27.68%)     79,077,241,028 (  27.75%)       37,735,797,899 (  16.44%)
  instructions                319,763,993,731 (    1.12)    319,937,782,834 (    1.12)      316,663,600,784 (    1.38)
  branches                     61,219,433,294 ( 595.056)     61,250,355,540 ( 598.215)       60,523,446,617 ( 733.706)
  branch-misses                   169,257,123 (   0.28%)        154,898,028 (   0.25%)          141,180,587 (   0.23%)
  jobs7
  stalled-cycles-frontend     162,974,812,119 (  49.20%)    159,290,061,987 (  48.43%)       88,046,641,169 (  33.21%)
  stalled-cycles-backend       92,223,151,661 (  27.84%)     91,667,904,406 (  27.87%)       44,068,454,971 (  16.62%)
  instructions                369,516,432,430 (    1.12)    369,361,799,063 (    1.12)      365,290,380,661 (    1.38)
  branches                     70,795,673,950 ( 594.220)     70,743,136,124 ( 597.876)       69,803,996,038 ( 732.822)
  branch-misses                   181,708,327 (   0.26%)        165,767,821 (   0.23%)          150,109,797 (   0.22%)
  jobs8
  stalled-cycles-frontend     185,000,017,027 (  49.30%)    182,334,345,473 (  48.37%)       99,980,147,041 (  33.26%)
  stalled-cycles-backend      105,753,516,186 (  28.18%)    107,937,830,322 (  28.63%)       51,404,177,181 (  17.10%)
  instructions                418,153,161,055 (    1.11)    418,308,565,828 (    1.11)      413,653,475,581 (    1.38)
  branches                     80,035,882,398 ( 592.296)     80,063,204,510 ( 589.843)       79,024,105,589 ( 730.530)
  branch-misses                   199,764,528 (   0.25%)        177,936,926 (   0.22%)          160,525,449 (   0.20%)
  jobs9
  stalled-cycles-frontend     210,941,799,094 (  49.63%)    204,714,679,254 (  48.55%)      114,251,113,756 (  33.96%)
  stalled-cycles-backend      122,640,849,067 (  28.85%)    122,188,553,256 (  28.98%)       58,360,041,127 (  17.35%)
  instructions                468,151,025,415 (    1.10)    467,354,869,323 (    1.11)      462,665,165,216 (    1.38)
  branches                     89,657,067,510 ( 585.628)     89,411,550,407 ( 588.990)       88,360,523,943 ( 730.151)
  branch-misses                   218,292,301 (   0.24%)        191,701,247 (   0.21%)          178,535,678 (   0.20%)
  jobs10
  stalled-cycles-frontend     233,595,958,008 (  49.81%)    227,540,615,689 (  49.11%)      160,341,979,938 (  43.07%)
  stalled-cycles-backend      136,153,676,021 (  29.03%)    133,635,240,742 (  28.84%)       65,909,135,465 (  17.70%)
  instructions                517,001,168,497 (    1.10)    516,210,976,158 (    1.11)      511,374,038,613 (    1.37)
  branches                     98,911,641,329 ( 585.796)     98,700,069,712 ( 591.583)       97,646,761,028 ( 728.712)
  branch-misses                   232,341,823 (   0.23%)        199,256,308 (   0.20%)          183,135,268 (   0.19%)

per-cpu streams tend to cause significantly less stalled cycles; execute
less branches and hit less branch-misses.

perf stat reported execution time

                          4 streams        8 streams       per-cpu
  ====================================================================
  jobs1
  seconds elapsed        20.909073870     20.875670495    20.817838540
  jobs2
  seconds elapsed        18.529488399     18.720566469    16.356103108
  jobs3
  seconds elapsed        18.991159531     18.991340812    16.766216066
  jobs4
  seconds elapsed        19.560643828     19.551323547    16.246621715
  jobs5
  seconds elapsed        24.746498464     25.221646740    20.696112444
  jobs6
  seconds elapsed        28.258181828     28.289765505    22.885688857
  jobs7
  seconds elapsed        32.632490241     31.909125381    26.272753738
  jobs8
  seconds elapsed        35.651403851     36.027596308    29.108024711
  jobs9
  seconds elapsed        40.569362365     40.024227989    32.898204012
  jobs10
  seconds elapsed        44.673112304     43.874898137    35.632952191

Please see
  Link: http://marc.info/?l=linux-kernel&m=146166970727530
  Link: http://marc.info/?l=linux-kernel&m=146174716719650
for more test results (under low memory conditions).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit da9556a236)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I1af1a466f0ac3f74f9c36f06685111ccef0f4ec4
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
055114890d BACKPORT: zsmalloc: require GFP in zs_malloc()
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.

Apart from that, this also align zs_malloc() interface with zspool/zbud.

[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
  Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit d0d8da2dc4)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I31276c9351be21a4ed588681b332e98142b76526
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Sergey Senozhatsky
2fafbdf79e UPSTREAM: zram/zcomp: do not zero out zcomp private pages
Do not __GFP_ZERO allocated zcomp ->private pages.  We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream.  For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit e02d238c98)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I911832da703f596998a4139d6033ef1564848c9e
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Minchan Kim
711c8927f8 UPSTREAM: zram: pass gfp from zcomp frontend to backend
Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend).  As well, zcomp frondend could call them in different
context.  One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.

[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 75d8947a36)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 112488418
Change-Id: I572d0565de5aff94ebe0782eba9d34f9c9862060
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:16:10 +05:30
Cong Wang
8e1d8cd24d UPSTREAM: socket: close race condition between sock_close() and sockfs_setattr()
fchownat() doesn't even hold refcnt of fd until it figures out
fd is really needed (otherwise is ignored) and releases it after
it resolves the path. This means sock_close() could race with
sockfs_setattr(), which leads to a NULL pointer dereference
since typically we set sock->sk to NULL in ->release().

As pointed out by Al, this is unique to sockfs. So we can fix this
in socket layer by acquiring inode_lock in sock_close() and
checking against NULL in sockfs_setattr().

sock_release() is called in many places, only the sock_close()
path matters here. And fortunately, this should not affect normal
sock_close() as it is only called when the last fd refcnt is gone.
It only affects sock_close() with a parallel sockfs_setattr() in
progress, which is not common.

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Reported-by: shankarapailoor <shankarapailoor@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 6d8c50dcb0)
Signed-off-by: Chenbo Feng <fengc@google.com>

Bug: 112220999
Test: syzcaller reproducer doesn't trigger the crash anymore
Change-Id: I90bec1515889e0dfd23f94e3f29b366c7bbfcd11
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:00:54 +05:30
Alistair Strachan
688f9b4e1a ANDROID: Refresh x86_64_cuttlefish_defconfig
An LTS change removed the need to set a config option. This broke the
comparison validation with the output of "make savedefconfig".

Change-Id: Id7ed6c6546d0efe88b67c0d1b92183152406e6f6
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 14:00:20 +05:30
Amit Pundir
8ede64c159 Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.4.y' into android-4.4
6944da0a68 treewide: Use array_size in f2fs_kvzalloc()
f15443db99 treewide: Use array_size() in f2fs_kzalloc()
3ea03ea4bd treewide: Use array_size() in f2fs_kmalloc()
c41203299a overflow.h: Add allocation size calculation helpers
d400752f54 f2fs: fix to clear FI_VOLATILE_FILE correctly
853e7339b6 f2fs: let sync node IO interrupt async one
6a4540cf19 f2fs: don't change wbc->sync_mode
588ecdfd7d f2fs: fix to update mtime correctly
1ae5aadab1 fs: f2fs: insert space around that ':' and ', '
39ee53e223 fs: f2fs: add missing blank lines after declarations
d5b4710fcf fs: f2fs: changed variable type of offset "unsigned" to "loff_t"
c35da89531 f2fs: clean up symbol namespace
fcf37e16f3 f2fs: make set_de_type() static
5d1633aa10 f2fs: make __f2fs_write_data_pages() static
cc8093af7c f2fs: fix to avoid accessing cross the boundary
b7f5594670 f2fs: fix to let caller retry allocating block address
e48fcd8576 disable loading f2fs module on PAGE_SIZE > 4KB
02afc275a5 f2fs: fix error path of move_data_page
0291bd36d0 f2fs: don't drop dentry pages after fs shutdown
a1259450b6 f2fs: fix to avoid race during access gc_thread pointer
d2e0f2f786 f2fs: clean up with clear_radix_tree_dirty_tag
c74034518f f2fs: fix to don't trigger writeback during recovery
e72a2cca82 f2fs: clear discard_wake earlier
b25a1872e9 f2fs: let discard thread wait a little longer if dev is busy
b125dfb20d f2fs: avoid stucking GC due to atomic write
405909e7f5 f2fs: introduce sbi->gc_mode to determine the policy
1f62e4702a f2fs: keep migration IO order in LFS mode
c4408c2387 f2fs: fix to wait page writeback during revoking atomic write
9db5be4af8 f2fs: Fix deadlock in shutdown ioctl
ed74404955 f2fs: detect synchronous writeback more earlier
91e7d9d2dd mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
feb94dc829 ceph: use pagevec_lookup_range_nr_tag()
f3aa4a25b8 mm: add variant of pagevec_lookup_range_tag() taking number of pages
8914877e37 mm: use pagevec_lookup_range_tag() in write_cache_pages()
26778b87a0 mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
94f1b99298 nilfs2: use pagevec_lookup_range_tag()
160355d69f gfs2: use pagevec_lookup_range_tag()
564108e83a f2fs: use find_get_pages_tag() for looking up single page
6cf6fb8645 f2fs: simplify page iteration loops
a05d8a6a2b f2fs: use pagevec_lookup_range_tag()
18a4848ffd ext4: use pagevec_lookup_range_tag()
1c7be24f65 ceph: use pagevec_lookup_range_tag()
e25fadabb5 btrfs: use pagevec_lookup_range_tag()
bf9510b162 mm: implement find_get_pages_range_tag()
461247b21f f2fs: clean up with is_valid_blkaddr()
a5d0ccbc18 f2fs: fix to initialize min_mtime with ULLONG_MAX
9bb4d22cf5 f2fs: fix to let checkpoint guarantee atomic page persistence
cdcf2b3e25 f2fs: fix to initialize i_current_depth according to inode type
331ae0c25b Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
2494cc7c0b f2fs: don't drop any page on f2fs_cp_error() case
0037c639e6 f2fs: fix spelling mistake: "extenstion" -> "extension"
2bba5b8eb8 f2fs: enhance sanity_check_raw_super() to avoid potential overflows
9bb86b63dc f2fs: treat volatile file's data as hot one
2cf6459036 f2fs: introduce release_discard_addr() for cleanup
03279ce90b f2fs: fix potential overflow
f46eddc4da f2fs: rename dio_rwsem to i_gc_rwsem
bb01582453 f2fs: move mnt_want_write_file after range check
8bb9a8da75 f2fs: fix missing clear FI_NO_PREALLOC in some error case
cb38cc4e1d f2fs: enforce fsync_mode=strict for renamed directory
26bf4e8a96 f2fs: sanity check for total valid node blocks
78f8b0f46f f2fs: sanity check on sit entry
ab758ada22 f2fs: avoid bug_on on corrupted inode
1a5d1966c0 f2fs: give message and set need_fsck given broken node id
b025f6dfc0 f2fs: clean up commit_inmem_pages()
7aff5c69da f2fs: do not check F2FS_INLINE_DOTS in recover
23d00b0287 f2fs: remove duplicated dquot_initialize and fix error handling
937f4ef79e f2fs: stop issue discard if something wrong with f2fs
a6d74bb282 f2fs: fix return value in f2fs_ioc_commit_atomic_write
258489ec52 f2fs: allocate hot_data for atomic write more strictly
aa857e0f3b f2fs: check if inmem_pages list is empty correctly
9d77ded0a7 f2fs: fix race in between GC and atomic open
0d17eb90b5 f2fs: change le32 to le16 of f2fs_inode->i_extra_size
ea2813111f f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
9190cadf38 f2fs: correct return value of f2fs_trim_fs
17f85d0708 f2fs: fix to show missing bits in FS_IOC_GETFLAGS
3e90db63fc f2fs: remove unneeded F2FS_PROJINHERIT_FL
298032d4d4 f2fs: don't use GFP_ZERO for page caches
fdf61219dc f2fs: issue all big range discards in umount process
cd79eb2b5e f2fs: remove redundant block plug
ec034d0f14 f2fs: remove unmatched zero_user_segment when convert inline dentry
71aaced0e1 f2fs: introduce private inode status mapping
e7724207f7 fscrypt: log the crypto algorithm implementations
4cbda579cd crypto: api - Add crypto_type_has_alg helper
b24dcaae87 crypto: skcipher - Add low-level skcipher interface
a9146e4235 crypto: skcipher - Add helper to retrieve driver name
a0ca4bdf47 crypto: skcipher - Add default key size helper
eb13e0b692 fscrypt: add Speck128/256 support
27a0e77380 fscrypt: only derive the needed portion of the key
f68a71fa8f fscrypt: separate key lookup from key derivation
52359cf4fd fscrypt: use a common logging function
ff8e7c745e fscrypt: remove internal key size constants
7149dd4d39 fscrypt: remove unnecessary check for non-logon key type
56446c9142 fscrypt: make fscrypt_operations.max_namelen an integer
f572a22ef9 fscrypt: drop empty name check from fname_decrypt()
0077eff1d2 fscrypt: drop max_namelen check from fname_decrypt()
3f7af9d27f fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
52c51f7b7b fscrypt: don't clear flags on crypto transform
89b7fb8298 fscrypt: remove stale comment from fscrypt_d_revalidate()
d56de4e926 fscrypt: remove error messages for skcipher_request_alloc() failure
f68d3b84ae fscrypt: remove unnecessary NULL check when allocating skcipher
fb10231825 fscrypt: clean up after fscrypt_prepare_lookup() conversions
39b1444906 fscrypt: use unbound workqueue for decryption

Change-Id: Ied79ecd97385c05ef26e6b7b24d250eee9ec4e47
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>

Conflicts:
    fs/crypto/keyinfo.c
    fs/f2fs/inline.c
        Resolved conflicts based on android-4.4:fs/f2fs codebase.

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 13:56:15 +05:30
Amit Pundir
84e1d70f12 Merge commit '73450231ffff' into android-4.4
Synced to the latest commit for merge:
73450231ff f2fs: run fstrim asynchronously if runtime discard is on

Change-Id: Icec09d14f2768dfaa1a0691eac275f68adbf470b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>

Conflicts:
    fs/crypto/fscrypt_private.h
    fs/crypto/keyinfo.c
    fs/f2fs/data.c
    fs/f2fs/f2fs.h
    fs/f2fs/namei.c
    fs/f2fs/segment.c
    fs/f2fs/super.c
    include/uapi/linux/fs.h
        Resolved conflicts based on android-4.4:fs/f2fs codebase.

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-04 13:44:14 +05:30
Daniel Rosenberg
072304e803 ANDROID: sdcardfs: Check stacked filesystem depth
bug: 111860541
Change-Id: Ia0a30b2b8956c4ada28981584cd8647713a1e993
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-03 18:33:05 +05:30
Alistair Strachan
97b9031454 x86_64_cuttlefish_defconfig: Enable android-verity
Bug: 72722987
Test: Build & boot with x86_64_cuttlefish_defconfig
Change-Id: I961e6aaa944b5ab0c005cb39604a52f8dc98fb06
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-03 18:29:03 +05:30
Alistair Strachan
dc5836a4da x86_64_cuttlefish_defconfig: enable verity cert
Bug: 72722987
Test: Build, boot and verify in /proc/keys
Change-Id: Ia55b94d56827003a88cb6083a75340ee31347470
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-03 18:28:52 +05:30
Amit Pundir
c8f435d8fd Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
* linux-linaro-lsk-v4.4: (783 commits)
  Linux 4.4.159
  iw_cxgb4: only allow 1 flush on user qps
  HID: sony: Support DS4 dongle
  HID: sony: Update device ids
  arm64: Add trace_hardirqs_off annotation in ret_to_user
  ext4: don't mark mmp buffer head dirty
  ext4: fix online resizing for bigalloc file systems with a 1k block size
  ext4: fix online resize's handling of a too-small final block group
  ext4: recalucate superblock checksum after updating free blocks/inodes
  ext4: avoid divide by zero fault when deleting corrupted inline directories
  tty: vt_ioctl: fix potential Spectre v1
  drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
  ocfs2: fix ocfs2 read block panic
  scsi: target: iscsi: Use hex2bin instead of a re-implementation
  neighbour: confirm neigh entries when ARP packet is received
  net: hp100: fix always-true check for link up state
  net/appletalk: fix minor pointer leak to userspace in SIOCFINDIPDDPRT
  ipv6: fix possible use-after-free in ip6_xmit()
  gso_segment: Reset skb->mac_len after modifying network header
  mm: shmem.c: Correctly annotate new inodes for lockdep
  ...

Conflicts:
    Makefile
    fs/squashfs/block.c
    include/uapi/linux/prctl.h
    kernel/fork.c
    kernel/sys.c
        Trivial merge conflicts in above files. Resolved by rebasing
        corresponding AOSP changes.

    arch/arm64/mm/init.c
        Pick the changes from upstream version of AOSP patch
        "arm64: check for upper PAGE_SHIFT bits in pfn_valid" instead.

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-10-03 15:00:28 +05:30
Nick Desaulniers
d36d92bca7 compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
Functions marked extern inline do not emit an externally visible
function when the gnu89 C standard is used. Some KBUILD Makefiles
overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without
an explicit C standard specified, the default is gnu11. Since c99, the
semantics of extern inline have changed such that an externally visible
function is always emitted. This can lead to multiple definition errors
of extern inline functions at link time of compilation units whose build
files have removed an explicit C standard compiler flag for users of GCC
5.1+ or Clang.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: akataria@vmware.com
Cc: akpm@linux-foundation.org
Cc: andrea.parri@amarulasolutions.com
Cc: ard.biesheuvel@linaro.org
Cc: aryabinin@virtuozzo.com
Cc: astrachan@google.com
Cc: boris.ostrovsky@oracle.com
Cc: brijesh.singh@amd.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: geert@linux-m68k.org
Cc: ghackmann@google.com
Cc: gregkh@linuxfoundation.org
Cc: jan.kiszka@siemens.com
Cc: jarkko.sakkinen@linux.intel.com
Cc: jpoimboe@redhat.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: kstewart@linuxfoundation.org
Cc: linux-efi@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: manojgupta@google.com
Cc: mawilcox@microsoft.com
Cc: michal.lkml@markovi.net
Cc: mjg59@google.com
Cc: mka@chromium.org
Cc: pombredanne@nexb.com
Cc: rientjes@google.com
Cc: rostedt@goodmis.org
Cc: sedat.dilek@gmail.com
Cc: thomas.lendacky@amd.com
Cc: tstellar@redhat.com
Cc: tweek@google.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: yamada.masahiro@socionext.com
Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit d03db2bc26)
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-01 14:09:14 +01:00
Mark Brown
c6ee9e74b2 Merge tag 'v4.4.159' into linux-linaro-lsk-v4.4
This is the 4.4.159 stable release

# gpg: Signature made Sat 29 Sep 2018 11:08:56 BST
# gpg:                using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Good signature from "Greg Kroah-Hartman <gregkh@linuxfoundation.org>" [unknown]
# gpg:                 aka "Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>" [unknown]
# gpg:                 aka "Greg Kroah-Hartman <gregkh@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 647F 2865 4894 E3BD 4571  99BE 38DB BDC8 6092 693E
2018-10-01 12:13:41 +01:00
Greg Kroah-Hartman
9c6cd3f3a4 Linux 4.4.159 2018-09-29 03:08:55 -07:00
Steve Wise
82ea790afe iw_cxgb4: only allow 1 flush on user qps
commit 308aa2b8f7 upstream.

Once the qp has been flushed, it cannot be flushed again.  The user qp
flush logic wasn't enforcing it however.  The bug can cause
touch-after-free crashes like:

Unable to handle kernel paging request for data at address 0x000001ec
Faulting instruction address: 0xc008000016069100
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
Call Trace:
[c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
[c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
[c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
[c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
[c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
[c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
[c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
[c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
[c000000000444da4] __fput+0xe4/0x2f0

So fix flush_qp() to only flush the wq once.

Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:55 -07:00
Roderick Colenbrander
44c2e8a568 HID: sony: Support DS4 dongle
commit de66a1a04c upstream.

Add support for USB based DS4 dongle device, which allows connecting
a DS4 through Bluetooth, but hides Bluetooth from the host system.

Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Roderick Colenbrander
ce144dbfb4 HID: sony: Update device ids
commit cf1015d65d upstream.

Support additional DS4 model.

Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Catalin Marinas
d2e646c723 arm64: Add trace_hardirqs_off annotation in ret_to_user
commit db3899a647 upstream.

When a kernel is built with CONFIG_TRACE_IRQFLAGS the following warning
is produced when entering userspace for the first time:

  WARNING: at /work/Linux/linux-2.6-aarch64/kernel/locking/lockdep.c:3519
  Modules linked in:
  CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc3+ #639
  Hardware name: Juno (DT)
  task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000
  PC is at check_flags.part.22+0x19c/0x1a8
  LR is at check_flags.part.22+0x19c/0x1a8
  pc : [<ffffffc0000fba6c>] lr : [<ffffffc0000fba6c>] pstate: 600001c5
  sp : ffffffc9768abe10
  x29: ffffffc9768abe10 x28: ffffffc9768a8000
  x27: 0000000000000000 x26: 0000000000000001
  x25: 00000000000000a6 x24: ffffffc00064be6c
  x23: ffffffc0009f249e x22: ffffffc9768a0000
  x21: ffffffc97fea5480 x20: 00000000000001c0
  x19: ffffffc00169a000 x18: 0000005558cc7b58
  x17: 0000007fb78e3180 x16: 0000005558d2e238
  x15: ffffffffffffffff x14: 0ffffffffffffffd
  x13: 0000000000000008 x12: 0101010101010101
  x11: 7f7f7f7f7f7f7f7f x10: fefefefefefeff63
  x9 : 7f7f7f7f7f7f7f7f x8 : 6e655f7371726964
  x7 : 0000000000000001 x6 : ffffffc0001079c4
  x5 : 0000000000000000 x4 : 0000000000000001
  x3 : ffffffc001698438 x2 : 0000000000000000
  x1 : ffffffc9768a0000 x0 : 000000000000002e
  Call trace:
  [<ffffffc0000fba6c>] check_flags.part.22+0x19c/0x1a8
  [<ffffffc0000fc440>] lock_is_held+0x80/0x98
  [<ffffffc00064bafc>] __schedule+0x404/0x730
  [<ffffffc00064be6c>] schedule+0x44/0xb8
  [<ffffffc000085bb0>] ret_to_user+0x0/0x24
  possible reason: unannotated irqs-off.
  irq event stamp: 502169
  hardirqs last  enabled at (502169): [<ffffffc000085a98>] el0_irq_naked+0x1c/0x24
  hardirqs last disabled at (502167): [<ffffffc0000bb3bc>] __do_softirq+0x17c/0x298
  softirqs last  enabled at (502168): [<ffffffc0000bb43c>] __do_softirq+0x1fc/0x298
  softirqs last disabled at (502143): [<ffffffc0000bb830>] irq_exit+0xa0/0xf0

This happens because we disable interrupts in ret_to_user before calling
schedule() in work_resched. This patch adds the necessary
trace_hardirqs_off annotation.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Li Dongyang
e77dd99d4b ext4: don't mark mmp buffer head dirty
commit fe18d64989 upstream.

Marking mmp bh dirty before writing it will make writeback
pick up mmp block later and submit a write, we don't want the
duplicate write as kmmpd thread should have full control of
reading and writing the mmp block.
Another reason is we will also have random I/O error on
the writeback request when blk integrity is enabled, because
kmmpd could modify the content of the mmp block(e.g. setting
new seq and time) while the mmp block is under I/O requested
by writeback.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Theodore Ts'o
47af99763a ext4: fix online resizing for bigalloc file systems with a 1k block size
commit 5f8c10936f upstream.

An online resize of a file system with the bigalloc feature enabled
and a 1k block size would be refused since ext4_resize_begin() did not
understand s_first_data_block is 0 for all bigalloc file systems, even
when the block size is 1k.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Theodore Ts'o
70083af592 ext4: fix online resize's handling of a too-small final block group
commit f0a459dec5 upstream.

Avoid growing the file system to an extent so that the last block
group is too small to hold all of the metadata that must be stored in
the block group.

This problem can be triggered with the following reproducer:

umount /mnt
mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \
	-E resize=1073741824 /tmp/foo.img 128M
mount /tmp/foo.img /mnt
truncate --size 1708M /tmp/foo.img
resize2fs /dev/loop0 295400
umount /mnt
e2fsck -fy /tmp/foo.img

Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Theodore Ts'o
66671ee85a ext4: recalucate superblock checksum after updating free blocks/inodes
commit 4274f516d4 upstream.

When mounting the superblock, ext4_fill_super() calculates the free
blocks and free inodes and stores them in the superblock.  It's not
strictly necessary, since we don't use them any more, but it's nice to
keep them roughly aligned to reality.

Since it's not critical for file system correctness, the code doesn't
call ext4_commit_super().  The problem is that it's in
ext4_commit_super() that we recalculate the superblock checksum.  So
if we're not going to call ext4_commit_super(), we need to call
ext4_superblock_csum_set() to make sure the superblock checksum is
consistent.

Most of the time, this doesn't matter, since we end up calling
ext4_commit_super() very soon thereafter, and definitely by the time
the file system is unmounted.  However, it doesn't work in this
sequence:

mke2fs -Fq -t ext4 /dev/vdc 128M
mount /dev/vdc /vdc
cp xfstests/git-versions /vdc
godown /vdc
umount /vdc
mount /dev/vdc
tune2fs -l /dev/vdc

With this commit, the "tune2fs -l" no longer fails.

Reported-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:54 -07:00
Theodore Ts'o
7619c7f660 ext4: avoid divide by zero fault when deleting corrupted inline directories
commit 4d982e25d0 upstream.

A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault.  Fix this by using the size of the inline directory instead of
dir->i_size.

Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero.  (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)

https://bugzilla.kernel.org/show_bug.cgi?id=200933

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:53 -07:00
Gustavo A. R. Silva
1aa698b651 tty: vt_ioctl: fix potential Spectre v1
commit e97267cb4d upstream.

vsa.console is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/tty/vt/vt_ioctl.c:711 vt_ioctl() warn: potential spectre issue
'vc_cons' [r]

Fix this by sanitizing vsa.console before using it to index vc_cons

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:53 -07:00
Lyude Paul
64436716c3 drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
commit 6833fb1ec1 upstream.

It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:53 -07:00
Junxiao Bi
98e14c520f ocfs2: fix ocfs2 read block panic
commit 234b69e3e0 upstream.

While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
[203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
[203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
[203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
[203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
[203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
[203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
[203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
[203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
[203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
[203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775]  RSP <ffff88006ff4b818>

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:53 -07:00
Vincent Pelletier
afba6121b3 scsi: target: iscsi: Use hex2bin instead of a re-implementation
commit 1816494330 upstream.

This change has the following effects, in order of descreasing importance:

1) Prevent a stack buffer overflow

2) Do not append an unnecessary NULL to an anyway binary buffer, which
   is writing one byte past client_digest when caller is:
   chap_string_to_hex(client_digest, chap_r, strlen(chap_r));

The latter was found by KASAN (see below) when input value hes expected size
(32 hex chars), and further analysis revealed a stack buffer overflow can
happen when network-received value is longer, allowing an unauthenticated
remote attacker to smash up to 17 bytes after destination buffer (16 bytes
attacker-controlled and one null).  As switching to hex2bin requires
specifying destination buffer length, and does not internally append any null,
it solves both issues.

This addresses CVE-2018-14633.

Beyond this:

- Validate received value length and check hex2bin accepted the input, to log
  this rejection reason instead of just failing authentication.

- Only log received CHAP_R and CHAP_C values once they passed sanity checks.

==================================================================
BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021

CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G           O      4.17.8kasan.sess.connops+ #2
Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014
Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod]
Call Trace:
 dump_stack+0x71/0xac
 print_address_description+0x65/0x22e
 ? chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 kasan_report.cold.6+0x241/0x2fd
 chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod]
 ? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod]
 ? ftrace_caller_op_ptr+0xe/0xe
 ? __orc_find+0x6f/0xc0
 ? unwind_next_frame+0x231/0x850
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? ret_from_fork+0x35/0x40
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? deref_stack_reg+0xd0/0xd0
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? is_module_text_address+0xa/0x11
 ? kernel_text_address+0x4c/0x110
 ? __save_stack_trace+0x82/0x100
 ? ret_from_fork+0x35/0x40
 ? save_stack+0x8c/0xb0
 ? 0xffffffffc1660000
 ? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod]
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? process_one_work+0x35c/0x640
 ? worker_thread+0x66/0x5d0
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod]
 ? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod]
 chap_main_loop+0x172/0x570 [iscsi_target_mod]
 ? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod]
 ? rx_data+0xd6/0x120 [iscsi_target_mod]
 ? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod]
 ? cyc2ns_read_begin.part.2+0x90/0x90
 ? _raw_spin_lock_irqsave+0x25/0x50
 ? memcmp+0x45/0x70
 iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod]
 ? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod]
 ? del_timer+0xe0/0xe0
 ? memset+0x1f/0x40
 ? flush_sigqueue+0x29/0xd0
 iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod]
 ? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod]
 process_one_work+0x35c/0x640
 worker_thread+0x66/0x5d0
 ? flush_rcu_work+0x40/0x40
 kthread+0x1a0/0x1c0
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

The buggy address belongs to the page:
page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x17fffc000000000()
raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00
 ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00
>ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00
                                              ^
 ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2
 ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00
==================================================================

Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:53 -07:00
Vasily Khoruzhick
c6e3864253 neighbour: confirm neigh entries when ARP packet is received
[ Upstream commit f0e0d04413 ]

Update 'confirmed' timestamp when ARP packet is received. It shouldn't
affect locktime logic and anyway entry can be confirmed by any higher-layer
protocol. Thus it makes sense to confirm it when ARP packet is received.

Fixes: 77d7123342 ("neighbour: update neigh timestamps iff update is effective")
Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Colin Ian King
b8214c557c net: hp100: fix always-true check for link up state
[ Upstream commit a7f38002fb ]

The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
that is always non-zero and hence the wait for the link to drop always
terminates prematurely.  Fix this by using a logical not operator instead
of a bitwise complement.  This issue has been in the driver since
pre-2.6.12-rc2.

Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Willy Tarreau
fee0d23441 net/appletalk: fix minor pointer leak to userspace in SIOCFINDIPDDPRT
[ Upstream commit 9824dfae57 ]

Fields ->dev and ->next of struct ipddp_route may be copied to
userspace on the SIOCFINDIPDDPRT ioctl. This is only accessible
to CAP_NET_ADMIN though. Let's manually copy the relevant fields
instead of using memcpy().

BugLink: http://blog.infosectcbr.com.au/2018/09/linux-kernel-infoleaks.html
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Eric Dumazet
2ec3b47a78 ipv6: fix possible use-after-free in ip6_xmit()
[ Upstream commit bbd6528d28 ]

In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.

Bring IPv6 in line with what we do in IPv4 to fix this.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Toke Høiland-Jørgensen
cb66016b7b gso_segment: Reset skb->mac_len after modifying network header
[ Upstream commit c56cae23c6 ]

When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.

This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.

Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.

Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.

Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Joel Fernandes (Google)
4da7f35b06 mm: shmem.c: Correctly annotate new inodes for lockdep
commit b45d71fb89 upstream.

Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
> 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Vaibhav Nagarnaik
fed4d566a8 ring-buffer: Allow for rescheduling when removing pages
commit 83f365554e upstream.

When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.

After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.

Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com

Cc: stable@vger.kernel.org
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer <jbehmer@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:52 -07:00
Boris Ostrovsky
28ca9ed1c9 xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
commit 70513d5875 upstream.

Otherwise we may leak kernel stack for events that sample user
registers.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:08:51 -07:00