Commit Graph

84047 Commits

Author SHA1 Message Date
Hannes Reinecke
f1fcb9d292 scsi: sg: disable SET_FORCE_LOW_DMA
commit 745dfa0d8e upstream.

The ioctl SET_FORCE_LOW_DMA has never worked since the initial git
check-in, and the respective setting is nowadays handled correctly. So
disable it entirely.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:50:14 +01:00
Masahiro Yamada
675901851f kconfig.h: use __is_defined() to check if MODULE is defined
commit 4f920843d2 upstream.

The macro MODULE is not a config option, it is a per-file build
option.  So, config_enabled(MODULE) is not sensible.  (There is
another case in include/linux/export.h, where config_enabled() is
used against a non-config option.)

This commit renames some macros in include/linux/kconfig.h for the
use for non-config macros and replaces config_enabled(MODULE) with
__is_defined(MODULE).

I am keeping config_enabled() because it is still referenced from
some places, but I expect it would be deprecated in the future.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:50:12 +01:00
Al Viro
a88693d006 EXPORT_SYMBOL() for asm
commit 22823ab419 upstream.

Add asm-usable variants of EXPORT_SYMBOL/EXPORT_SYMBOL_GPL.  This
commit just adds the default implementation; most of the architectures
can simply add export.h to asm/Kbuild and start using <asm/export.h>
from assembler.  The rest needs to have their <asm/export.h> define
everal macros and then explicitly include <asm-generic/export.h>

One area where the things might diverge from default is the alignment;
normally it's 8 bytes on 64bit targets and 4 on 32bit ones, both for
unsigned long and for struct kernel_symbol.  Unfortunately, amd64 and
m68k are unusual - m68k aligns to 2 bytes (for both) and amd64 aligns
struct kernel_symbol to 16 bytes.  For those we'll need asm/export.h to
override the constants used by generic version - KSYM_ALIGN and KCRC_ALIGN
for kernel_symbol and unsigned long resp.  And no, __alignof__ would not
do the trick - on amd64 __alignof__ of struct kernel_symbol is 8, not 16.

More serious source of unpleasantness is treatment of function
descriptors on architectures that have those.  Things like ppc64,
parisc, ia64, etc.  need more than the address of the first insn to
call an arbitrary function.  As the result, their representation of
pointers to functions is not the typical "address of the entry point" -
it's an address of a small static structure containing all the required
information (including the entry point, of course).  Sadly, the asm-side
conventions differ in what the function name refers to - entry point or
the function descriptor.  On ppc64 we do the latter;
	bar: .quad foo
is what void (*bar)(void) = foo; turns into and the rare places where
we need to explicitly work with the label of entry point are dealt with
as DOTSYM(foo).  For our purposes it's ideal - generic macros are usable.
However, parisc would have foo and P%foo used for label of entry point
and address of the function descriptor and
	bar: .long P%foo
woudl be used instead.	ia64 goes similar to parisc in that respect,
except that there it's @fptr(foo) rather than P%foo.  Such architectures
need to define KSYM_FUNC that would turn a function name into whatever
is needed to refer to function descriptor.

What's more, on such architectures we need to know whether we are exporting
a function or an object - in assembler we have to tell that explicitly, to
decide whether we want EXPORT_SYMBOL(foo) produce e.g.
	__ksymtab_foo: .quad foo
or
	__ksymtab_foo: .quad @fptr(foo)

For that reason we introduce EXPORT_DATA_SYMBOL{,_GPL}(), to be used for
exports of data objects.  On normal architectures it's the same thing
as EXPORT_SYMBOL{,_GPL}(), but on parisc-like ones they differ and the
right one needs to be used.  Most of the exports are functions, so we
keep EXPORT_SYMBOL for those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:50:12 +01:00
Adam Borowski
b76ac90af3 x86/kbuild: enable modversions for symbols exported from asm
commit 334bb77387 upstream.

Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
modversion support for symbols exported from asm files. Architectures
must include C-style declarations for those symbols in asm/asm-prototypes.h
in order for them to be versioned.

Add these declarations for x86, and an architecture-independent file that
can be used for common symbols.

With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we
produce a scary warning on x86, this commit fixes that.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:50:11 +01:00
Noralf Trønnes
9728403491 UPSTREAM: drm/atomic: Add drm_atomic_helper_best_encoder()
Add (struct drm_connector_helper_funcs *)->best_encoder callback helper
for connectors that support exactly 1 encoder, statically determined at
driver init time.

Change-Id: Iddcf190e2a08a768dc8d5bb52a1e4572b1ce51d1
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1462982962-10530-6-git-send-email-noralf@tronnes.org
Signed-off-by: Jerry Xu <xbl@rock-chips.com>
(cherry picked from commit 9ecb549867)
2018-01-23 14:27:05 +08:00
Finley Xiao
db52b49619 clk: rockchip: px30: Fix parent clk for nand, sdio and emmc
Change-Id: I5723e114871d03d271a398a55af97474e08a61e1
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
2018-01-23 14:22:18 +08:00
Hyojun Kim
15b3c1dac0 blkdev: Refactoring block io latency histogram codes
The current io_latency_state structure includes entries for read
and write requests. There are special types of write commands
such as sync and discard (trim) commands, and the current
implementation is not general enough if we want to separate
latency histogram for such special commands.

This change makes io_latency_state structure request-type neutral.
It also changes to print the latency average.

Signed-off-by: Hyojun Kim <hyojun@google.com>
2018-01-22 13:16:20 +05:30
Jaegeuk Kim
51d59def78 fscrypt: updates on 4.15-rc4
Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:

ba1ade7101 fscrypt: resolve some cherry-pick bugs
9e32f17d24 fscrypt: move to generic async completion
4ecacbed6e crypto: introduce crypto wait for async op
42d89da82b fscrypt: lock mutex before checking for bounce page pool
2286508d17 fscrypt: new helper function - fscrypt_prepare_setattr()
5cbdd42ad2 fscrypt: new helper function - fscrypt_prepare_lookup()
a31feba5c1 fscrypt: new helper function - fscrypt_prepare_rename()
95efafb623 fscrypt: new helper function - fscrypt_prepare_link()
2b4b4f98dd fscrypt: new helper function - fscrypt_file_open()
8c815f381c fscrypt: new helper function - fscrypt_require_key()
272e435025 fscrypt: remove unneeded empty fscrypt_operations structs
1034eeec51 fscrypt: remove ->is_encrypted()
32c0d3ae9d fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
a4781dd1f1 fs, fscrypt: add an S_ENCRYPTED inode flag
ff0a3dbc93 fscrypt: clean up include file mess
bc4a61c60b fscrypt: fix dereference of NULL user_key_payload
a53dc7e005 fscrypt: make ->dummy_context() return bool

Change-Id: I461d742adc7b77177df91429a1fd9c8624a698d6
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-01-22 13:16:20 +05:30
Jens Wiklander
0640930301 BACKPORT: tee: indicate privileged dev in gen_caps
Mirrors the TEE_DESC_PRIVILEGED bit of struct tee_desc:flags into struct
tee_ioctl_version_data:gen_caps as TEE_GEN_CAP_PRIVILEGED in
tee_ioctl_version()

Change-Id: Iebd281e36b45181325da6b7982f045b4642e72d4
Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
(cherry picked from commit 059cf566e1)
Signed-off-by: Victor Chong <victor.chong@linaro.org>
2018-01-22 13:16:20 +05:30
Jerome Forissier
4a3efa5012 BACKPORT: tee: add forward declaration for struct device
tee_drv.h references struct device, but does not include device.h nor
platform_device.h. Therefore, if tee_drv.h is included by some file
that does not pull device.h nor platform_device.h beforehand, we have a
compile warning. Fix this by adding a forward declaration.

Change-Id: Iadb9563a540c95064774c577f679e0d630b939c8
Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
(cherry picked from commit 999616b853)
Signed-off-by: Victor Chong <victor.chong@linaro.org>
2018-01-22 13:16:20 +05:30
Victor Chibotaru
4b4c3b015e UPSTREAM: kcov: support comparison operands collection
Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from ded97d2c2b)
Change-Id: Iaba700a3f4786048be14a5e764ccabceae114eb7
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Dmitry Vyukov
8c8a5ee21a BACKPORT: kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 5c9a8750a6)
Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Masami Hiramatsu
2d76d88697 BACKPORT: irq: Make the irqentry text section unconditional
Generate irqentry and softirqentry text sections without
any Kconfig dependencies. This will add extra sections, but
there should be no performace impact.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S . Miller <davem@davemloft.net>
Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-cris-kernel@axis.com
Cc: mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 64145065
(cherry-picked from 229a718605)
Change-Id: I8f10ad59f16d637834a9dcacebdf087a028e995d
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
cc19018a12 UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from be7635e728)
Change-Id: Ib321eb9c2b76ef4785cf3fd522169f524348bd9a
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Mark Rutland
b6d435cf89 BACKPORT: kasan: report only the first error by default
Disable kasan after the first report.  There are several reasons for
this:

 - Single bug quite often has multiple invalid memory accesses causing
   storm in the dmesg.

 - Write OOB access might corrupt metadata so the next report will print
   bogus alloc/free stacktraces.

 - Reports after the first easily could be not bugs by itself but just
   side effects of the first one.

Given that multiple reports usually only do harm, it makes sense to
disable kasan after the first one.  If user wants to see all the
reports, the boot-time parameter kasan_multi_shot must be used.

[aryabinin@virtuozzo.com: wrote changelog and doc, added missing include]
Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from b0845ce583)
Change-Id: Ia8c6d40dd0d4f5b944bf3501c08d7a825070b116
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Ingo Molnar
b2667168ca BACKPORT: kasan, sched/headers: Uninline kasan_enable/disable_current()
<linux/kasan.h> is a low level header that is included early
in affected kernel headers. But it includes <linux/sched.h>
which complicates the cleanup of sched.h dependencies.

But kasan.h has almost no need for sched.h: its only use of
scheduler functionality is in two inline functions which are
not used very frequently - so uninline kasan_enable_current()
and kasan_disable_current().

Also add a <linux/sched.h> dependency to a .c file that depended
on kasan.h including it.

This paves the way to remove the <linux/sched.h> include from kasan.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 64145065
(cherry-picked from af8601ad42)
Change-Id: I13fd2d3927f663d694ea0d5bf44f18e2c62ae013
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Greg Thelen
29efb55f0a BACKPORT: kasan: drain quarantine of memcg slab objects
Per memcg slab accounting and kasan have a problem with kmem_cache
destruction.
 - kmem_cache_create() allocates a kmem_cache, which is used for
   allocations from processes running in root (top) memcg.
 - Processes running in non root memcg and allocating with either
   __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
   kmem_cache.
 - Kasan catches use-after-free by having kfree() and kmem_cache_free()
   defer freeing of objects. Objects are placed in a quarantine.
 - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
   care to drain the quarantine of objects from the root memcg's
   kmem_cache, but ignores objects associated with non root memcg. This
   causes leaks because quarantined per memcg objects refer to per memcg
   kmem cache being destroyed.

To see the problem:

 1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
 2) from non root memcg, allocate and free a few objects from cache
 3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
    will trigger a "Slab cache still has objects" warning indicating
    that the per memcg kmem_cache structure was leaked.

Fix the leak by draining kasan quarantined objects allocated from non
root memcg.

Racing memcg deletion is tricky, but handled.  kmem_cache_destroy() =>
shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
flushes per memcg quarantined objects, even if that memcg has been
rmdir'd and gone through memcg_deactivate_kmem_caches().

This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
enabled.  So I don't think it's worth patching stable kernels.

Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from f9fa1d919c)
Change-Id: Ie054d9cde7fb1ce62e65776bff5a70f72925d037
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Dmitry Vyukov
59ab55f0aa BACKPORT: kprobes: Unpoison stack in jprobe_return() for KASAN
I observed false KSAN positives in the sctp code, when
sctp uses jprobe_return() in jsctp_sf_eat_sack().

The stray 0xf4 in shadow memory are stack redzones:

[     ] ==================================================================
[     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
[     ] Read of size 1 by task syz-executor/18535
[     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
[     ] flags: 0x1fffc0000000000()
[     ] page dumped because: kasan: bad access detected
[     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
[     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
[     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
[     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
[     ] Call Trace:
[     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
[     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
[     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
[     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
[     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
[     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
[     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
[     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
[     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
[     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
[     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
[     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
[     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
[     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
[     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
[     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
[     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
[     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
[ ... ]
[     ] Memory state around the buggy address:
[     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]                    ^
[     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] ==================================================================

KASAN stack instrumentation poisons stack redzones on function entry
and unpoisons them on function exit. If a function exits abnormally
(e.g. with a longjmp like jprobe_return()), stack redzones are left
poisoned. Later this leads to random KASAN false reports.

Unpoison stack redzones in the frames we are going to jump over
before doing actual longjmp in jprobe_return().

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: kasan-dev@googlegroups.com
Cc: surovegin@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 64145065
(cherry-picked from 9f7d416c36)
Change-Id: I84e4fac44265a69f615601266b3415147dade633
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Ryabinin
73345d079c BACKPORT: mm/kasan: get rid of ->state in struct kasan_alloc_meta
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
	UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
	member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
	which requires 8 byte alignment

Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from b3cbd9bf77)
Change-Id: Iaa4959a78ffd2e49f9060099df1fb32483df3085
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Paul Lawrence
785fd027b0 BACKPORT: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
For KASAN builds:
 - switch SLUB allocator to using stackdepot instead of storing the
   allocation/deallocation stacks in the objects;
 - change the freelist hook so that parts of the freelist can be put
   into the quarantine.

[aryabinin@virtuozzo.com: fixes]
  Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 80a9201a59)
Change-Id: I2b59c6d50d0db62d3609edfdc7be54e48f8afa5c
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Ryabinin
197c34e95e UPSTREAM: mm: mempool: kasan: don't poot mempool objects in quarantine
Currently we may put reserved by mempool elements into quarantine via
kasan_kfree().  This is totally wrong since quarantine may really free
these objects.  So when mempool will try to use such element,
use-after-free will happen.  Or mempool may decide that it no longer
need that element and double-free it.

So don't put object into quarantine in kasan_kfree(), just poison it.
Rename kasan_kfree() to kasan_poison_kfree() to respect that.

Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
kasan_unpoison_element() because those functions may update allocation
stacktrace.  This would be wrong for the most of the remove_element call
sites.

(The only call site where we may want to update alloc stacktrace is
 in mempool_alloc(). Kmemleak solves this by calling
 kmemleak_update_trace(), so we could make something like that too.
 But this is out of scope of this patch).

Fixes: 55834c5909 ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 9b75a867cc)
Change-Id: Idb6c152dae8f8f2975dbe6acb7165315be8b465b
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Andrey Ryabinin
bf484b21a3 BACKPORT: mm/kasan: add API to check memory regions
Memory access coded in an assembly won't be seen by KASAN as a compiler
can instrument only C code.  Add kasan_check_[read,write]() API which is
going to be used to check a certain memory range.

Link: http://lkml.kernel.org/r/1462538722-1574-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 64f8ebaf11)
Change-Id: I3e75c7c22e77d390c55ca1b86ec58a6d6ea1da87
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
64d89f7804 UPSTREAM: mm: kasan: initial memory quarantine implementation
Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

When the object is freed, its state changes from KASAN_STATE_ALLOC to
KASAN_STATE_QUARANTINE.  The object is poisoned and put into quarantine
instead of being returned to the allocator, therefore every subsequent
access to that object triggers a KASAN error, and the error handler is
able to say where the object has been allocated and deallocated.

When it's time for the object to leave quarantine, its state becomes
KASAN_STATE_FREE and it's returned to the allocator.  From now on the
allocator may reuse it for another allocation.  Before that happens,
it's still possible to detect a use-after free on that object (it
retains the allocation/deallocation stacks).

When the allocator reuses this object, the shadow is unpoisoned and old
allocation/deallocation stacks are wiped.  Therefore a use of this
object, even an incorrect one, won't trigger ASan warning.

Without the quarantine, it's not guaranteed that the objects aren't
reused immediately, that's why the probability of catching a
use-after-free is lower than with quarantine in place.

Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

Freed objects are first added to per-cpu quarantine queues.  When a
cache is destroyed or memory shrinking is requested, the objects are
moved into the global quarantine queue.  Whenever a kmalloc call allows
memory reclaiming, the oldest objects are popped out of the global queue
until the total size of objects in quarantine is less than 3/4 of the
maximum quarantine size (which is a fraction of installed physical
memory).

As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is
increased.  Once the object leaves quarantine, the allocator may reuse
it, in which case the object is unpoisoned and KASAN can't detect
incorrect accesses to it.

Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.

This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov.  A number of improvements have been
suggested by Andrey Ryabinin.

[glider@google.com: v9]
  Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 55834c5909)
Change-Id: Ib808d72a40f2e5137961d93dad540e85f8bbd2c4
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
21aae3f038 BACKPORT: mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks.  The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator.  Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from cd11016e5f)
Change-Id: Ic804318410823b95d84e264a6334e018f21ef943
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
65be081bc4 BACKPORT: mm, kasan: add GFP flags to KASAN API
Add GFP flags to KASAN hooks for future patches to use.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 505f5dcb1c)
Change-Id: I7c5539f59e6969e484a6ff4f104dce2390669cfd
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Alexander Potapenko
7e712c1c7f UPSTREAM: mm, kasan: SLAB support
Add KASAN hooks to SLAB allocator.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 7ed2f9e663)
Change-Id: I131fdafc1c27a25732475f5bbd1653b66954e1b7
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Joonsoo Kim
d93421f6c9 UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK
DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
status buffer so we need more memory to use it.  And, it cause
kmem_cache initialization step more complex.

To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.

When user requests to get slab object owner information, it marks that
getting information is started.  And then, all free objects in caches
are flushed to corresponding slab page.  Now, we can distinguish all
freed object so we can know all allocated objects, too.  After
collecting slab object owner information on allocated objects, mark is
checked that there is no free during the processing.  If true, we can be
sure that our information is correct so information is returned to user.

Although this way is rather complex, it has two important benefits
mentioned above.  So, I think it is worth changing.

There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.

To help review, this patch implements new way only.  Following patch
will remove useless code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from d31676dfde)
Change-Id: I204ea0dd5553577d17c93f32f0d5a797ba0304af
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
Mark Rutland
63e2298591 UPSTREAM: kasan: add functions to clear stack poison
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from e3ae116339)
Change-Id: I9be31b714d5bdaec94a2dad3f0e468c094fe5fa2
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
John Stultz
aafcb0f503 BACKPORT: time: Clean up CLOCK_MONOTONIC_RAW time handling
(cherry pick from commit fc6eead7c1)

Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Tested-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 20045882
Bug: 63737556
Change-Id: I243827d21b08703a09d2d2fe738a9258be224582
2018-01-22 13:15:43 +05:30
John Stultz
581a337227 BACKPORT: time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
(cherry pick from commit 3d88d56c58)

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "stable #4 . 8+" <stable@vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bug: 20045882
Bug: 63737556
Change-Id: I6c55dd7685f6bd212c6af9d09c527528e1dd5fa1
2018-01-22 13:15:43 +05:30
Jaegeuk Kim
6522a6d452 f2fs: updates on 4.15-rc1
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we introduce sysfile-based quota support which is
  required for Android by default. In addition, we allow that users are
  able to reserve some blocks in runtime to mitigate performance drops
  in low free space.

 Enhancements:
  - assign proper data segments according to write_hints given by user
  - issue cache_flush on dirty devices only among multiple devices
  - exploit cp_error flag and add more faults to enhance fault
    injection test
  - conduct more readaheads during f2fs_readdir
  - add a range for discard commands

Bug fixes:
 - fix zero stat->st_blocks when inline_data is set
 - drop crypto key and free stale memory pointer while evict_inode is
   failing
 - fix some corner cases in free space and segment management
 - fix wrong last_disk_size

This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"

Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:

56a07b0705 f2fs: deny accessing encryption policy if encryption is off
c394842e26 f2fs: inject fault in inc_valid_node_count
9262922510 f2fs: fix to clear FI_NO_PREALLOC
e6cfc5de2d f2fs: expose quota information in debugfs
c4cd2efe83 f2fs: separate nat entry mem alloc from nat_tree_lock
48c72b4c8c f2fs: validate before set/clear free nat bitmap
baf9275a4b f2fs: avoid opened loop codes in __add_ino_entry
47af6c72d9 f2fs: apply write hints to select the type of segments for buffered write
ac98191605 f2fs: introduce scan_curseg_cache for cleanup
ca28e9670e f2fs: optimize the way of traversing free_nid_bitmap
460688b59e f2fs: keep scanning until enough free nids are acquired
0186182c0c f2fs: trace checkpoint reason in fsync()
5d4b6efcfd f2fs: keep isize once block is reserved cross EOF
3c8f767e13 f2fs: avoid race in between GC and block exchange
4423778adf f2fs: save a multiplication for last_nid calculation
3e3b405575 f2fs: fix summary info corruption
44889e4879 f2fs: remove dead code in update_meta_page
55c7b9595b f2fs: remove unneeded semicolon
8b92814117 f2fs: don't bother with inode->i_version
42c7c71824 f2fs: check curseg space before foreground GC
c5470498e5 f2fs: use rw_semaphore to protect SIT cache
82750d346a f2fs: support quota sys files
26dfec49b2 f2fs: add quota_ino feature infra
ddb8e2ae98 f2fs: optimize __update_nat_bits
f46ae958c7 f2fs: modify for accurate fggc node io stat
c713fdb5a2 Revert "f2fs: handle dirty segments inside refresh_sit_entry"
873ec505cb f2fs: add a function to move nid
ae66786296 f2fs: export SSR allocation threshold
90c28a18d2 f2fs: give correct trimmed blocks in fstrim
5612922fb0 f2fs: support bio allocation error injection
583b7a274c f2fs: support get_page error injection
09a073cc8c f2fs: add missing sysfs description
e945474a9c f2fs: support soft block reservation
b7b2e629b6 f2fs: handle error case when adding xattr entry
7368e30495 f2fs: support flexible inline xattr size
ada4061e19 f2fs: show current cp state
5b8ff1301a f2fs: add missing quota_initialize
46d4a691f0 f2fs: show # of dirty segments via sysfs
fc13f9d7ce f2fs: stop all the operations by cp_error flag
91bea0c391 f2fs: remove several redundant assignments
807486c795 f2fs: avoid using timespec
03b1cb0bb4 f2fs: fix to correct no_fggc_candidate
5c15033cea Revert "f2fs: return wrong error number on f2fs_quota_write"
5f5f593222 f2fs: remove obsolete pointer for truncate_xattr_node
032a690682 f2fs: retry ENOMEM for quota_read|write
171b638fc4 f2fs: limit # of inmemory pages
83ed7a615f f2fs: update ctx->pos correctly when hitting hole in directory
4d6e68be25 f2fs: relocate readahead codes in readdir()
c8be47b540 f2fs: allow readdir() to be interrupted
2b903fe94c f2fs: trace f2fs_readdir
bb0db666d4 f2fs: trace f2fs_lookup
40d6250f04 f2fs: skip searching non-exist range in truncate_hole
8e84f379df f2fs: expose some sectors to user in inline data or dentry case
cb98f70dea f2fs: avoid stale fi->gdirty_list pointer
5562a3c539 f2fs/crypto: drop crypto key at evict_inode only
85853e7e38 f2fs: fix to avoid race when accessing last_disk_size
0c47a892d5 f2fs: Fix bool initialization/comparison
68e801abc5 f2fs: give up CP_TRIMMED_FLAG if it drops discards
df74eacb20 f2fs: trace f2fs_remove_discard
bd502c6e3e f2fs: reduce cmd_lock coverage in __issue_discard_cmd
a34ab5ca4f f2fs: split discard policy
1e65afd14d f2fs: wrap discard policy
684447dad1 f2fs: support issuing/waiting discard in range
27eaad0938 f2fs: fix to flush multiple device in checkpoint
08bb9d68d5 f2fs: enhance multiple device flush
9c2526ac2e f2fs: fix to show ino management cache size correctly
814b463d26 f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
f555b0a117 f2fs: obsolete ALLOC_NID_LIST list
75d3164ae1 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
4de0ceb6b7 f2fs: allow readpages with NULL file pointer
322a45d172 f2fs: show flush list status in sysfs
6d625a93b4 f2fs: introduce read_xattr_block
8ea6e1c327 f2fs: introduce read_inline_xattr
dbce11e9ee Revert "f2fs: reuse nids more aggressively"
131bc9f6b7 Revert "f2fs: node segment is prior to data segment selected victim"

Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-01-22 13:15:43 +05:30
Laura Abbott
5c42cda2b4 UPSTREAM: mm: Introduce lm_alias
(cherry-pick from commit 568c5fe5a5)

Certain architectures may have the kernel image mapped separately to
alias the linear map. Introduce a macro lm_alias to translate a kernel
image symbol into its linear alias. This is used in part with work to
add CONFIG_DEBUG_VIRTUAL support for arm64.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug: 20045882
Bug: 63737556
Change-Id: I93ba4ef8df77f7e52717ab6e208d06122ac3a72b
2018-01-22 13:15:43 +05:30
Amit Pundir
395ae9f4e6 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

Conflicts:
    kernel/fork.c
        Conflict due to Kaiser implementation in LTS 4.4.110.
    net/ipv4/raw.c
        Minor conflict due to LTS commit
        be27b620a8 ("net: ipv4: fix for a race condition in raw_sendmsg")
2018-01-22 11:50:22 +05:30
Jacob Chen
c1df7dec16 media: rk-isp10: grab a copy of userspace headers
Change-Id: I75ceb19f0f4c9a420440ae73538f030aaf49fb95
Signed-off-by: Jacob Chen <cc@rock-chips.com>
2018-01-20 15:26:04 +08:00
Finley Xiao
6d64d2dc10 soc: rockchip: power-domain: Modify power domain driver for rk3228
This driver is modified to support RK3228 SoC.
RK3228 SoC is only support idle.

Change-Id: I1c3e942de13fd241e3893818e2584c25bad69b27
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
2018-01-19 14:44:34 +08:00
Algea Cao
cef745a23a media: cec: Add an interface for hdmi report HPD
Add cec-notifier interface that hdmi can call it when
HPD is occurred.

Change-Id: I0087a879bf3ba65e300c3db3a67ddaa7e22f16ad
Signed-off-by: Algea Cao <algea.cao@rock-chips.com>
2018-01-18 16:53:37 +08:00
Finley Xiao
8b31d4d2da clk: rockchip: px30: Modify clk tree according to latest document
Change-Id: Ib8d983509792b13c1cc84c78af0f572b89053cc7
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
2018-01-18 16:52:22 +08:00
Alex Shi
13b48cd3ac Merge tag 'v4.4.112' into linux-linaro-lsk-v4.4
This is the 4.4.112 stable release
2018-01-18 12:02:00 +08:00
Thomas Gleixner
73492b6860 sysfs/cpu: Add vulnerability folder
commit 87590ce6e3 upstream.

As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.

Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.

Allow architectures to override the show function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:33 +01:00
Nicholas Bellinger
531a2595ce target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
commit 1c21a48055 upstream.

This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.

Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.

This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed.  As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.

To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.

Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Alexei Starovoitov
9a7fad4c0e bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Daniel Borkmann
087a92287d bpf: add bpf_patch_insn_single helper
commit c237ee5eb3 upstream.

Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Jerome Brunet
b188ce30cc net: stmmac: enable EEE in MII, GMII or RGMII only
[ Upstream commit 879626e3a5 ]

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6bb ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:29 +01:00
Sergei Shtylyov
f574e2e8c6 sh_eth: fix SH7757 GEther initialization
[ Upstream commit 5133550296 ]

Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
'sh_eth' driver can only reset and initialize TSU of the first controller
pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
'struct sh_eth_plat_data', however the platform code still never sets this
flag. I think  that we can infer this information from the 'devno' variable
(set  to 'platform_device::id') and reset/init the Ether controller pair
only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
removed...

Fixes: 150647fb2c ("net: sh_eth: change the condition of initialization")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:29 +01:00
Wanpeng Li
eb91461daa KVM: Fix stack-out-of-bounds read in write_mmio
commit e39d200fa5 upstream.

Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:24 +01:00
xuhuicong
87b681f969 drm/bridge: dw-hdmi: fix display shaking when uboot to kernel show
Change-Id: I899bb0dde7111fe97dd2c89d20afb09562d31300
Signed-off-by: xuhuicong <xhc@rock-chips.com>
2018-01-15 14:11:16 +08:00
Zhou weixin
05ba6262c6 Rockchip: vip: add gc0312 driver
Change-Id: I2cc81469f48b35833832ff102b45221ea340f5a9
Signed-off-by: Zhou weixin <zwx@rock-chips.com>
2018-01-12 19:31:29 +08:00
Alex Shi
921884d92b Merge tag 'v4.4.111' into linux-linaro-lsk-v4.4
This is the 4.4.111 stable release
2018-01-11 12:01:16 +08:00
Thiago Rafael Becker
58330ec2fe kernel: make groups_sort calling a responsibility group_info allocators
commit bdcf0a423e upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
David Howells
d1698dc8a5 fscache: Fix the default for fscache_maybe_release_page()
commit 9880150655 upstream.

Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.

Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00