Commit Graph

575890 Commits

Author SHA1 Message Date
Kees Cook
aa8893750a UPSTREAM: seccomp: Fix tracer exit notifications during fatal signals
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
https://github.com/mozilla/rr/issues/1762#issuecomment-237396255.

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8d ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a55)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:47:10 +05:30
Kees Cook
edc64cccc5 UPSTREAM: arm64/ptrace: run seccomp after ptrace
Close the hole where ptrace can change a syscall out from under seccomp.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
(cherry picked from commit a5cd110cb8)

Bug: 119769499
Change-Id: I9fd3e8e6d38122866df434b2676bf7ba0e808e32
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:47:00 +05:30
Kees Cook
ee785cf928 UPSTREAM: arm/ptrace: run seccomp after ptrace
Close the hole where ptrace can change a syscall out from under seccomp.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
(cherry picked from commit 0f3912fd93)

Bug: 119769499
Change-Id: Id82e4137207db42a8af31b2745581c53eaaf1f89
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:49 +05:30
Kees Cook
9782f1fbb8 BACKPORT: x86/ptrace: run seccomp after ptrace
This moves seccomp after ptrace on x86 to that seccomp can catch changes
made by ptrace. Emulation should skip the rest of processing too.

We can get rid of test_thread_flag because there's no longer any
opportunity for seccomp to mess with ptrace state before invoking
ptrace.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
(cherry picked from commit 93e35efb8d)

Bug: 119769499
Change-Id: Ie1b9a18360799e68e22f67ce6a819c93433fdeaa
[ghackmann@google.com: adjust context]
Signed-off-by: Greg Hackmann <ghackmann@google.com>

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:40 +05:30
Kees Cook
9cf8a2391a UPSTREAM: seccomp: recheck the syscall after RET_TRACE
When RET_TRACE triggers, a tracer may change a syscall into something that
should be filtered by seccomp. This re-runs seccomp after a trace event
to make sure things continue to pass.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
(cherry picked from commit ce6526e8af)

Bug: 119769499
Change-Id: Ib67732df3c2ac8c6b1de87e75f96aaed02f4627d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:31 +05:30
Kees Cook
ceb1e1cda7 UPSTREAM: seccomp: remove 2-phase API
Since nothing is using the 2-phase API, and it adds more complexity than
benefit, remove it.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
(cherry picked from commit 8112c4f140)

Bug: 119769499
Change-Id: Iff6246c1e6e9dd0161b80b666a5e796f78a5c785
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:22 +05:30
Andy Lutomirski
5b053daef3 BACKPORT: x86/entry: Get rid of two-phase syscall entry work
I added two-phase syscall entry work back when the entry slow path
was very slow.  Nowadays, the entry slow path is fast and two-phase
entry work serves no purpose.  Remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit c87a85177e)

Bug: 119769499
Change-Id: Ieac4470411f88ca8830794d0322d8d8bb348039e
[ghackmann@google.com:
 - adjust for post-4.4 is_ia32_task() -> in_ia32_syscall() renaming
 - preserve TF flags fixup in syscall_trace_enter()
 - keep syscall_trace_enter() exported, since we haven't taken
   patches to move the calling code from entry_64.S to common.c]
Signed-off-by: Greg Hackmann <ghackmann@google.com>

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:12 +05:30
Andy Lutomirski
ec8e12f60f BACKPORT: seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to
seccomp (which is generally much faster than having seccomp do it
using the syscall_get_xyz() API), it has to use the two-phase
seccomp hooks. Add it to the easy hooks, too.

Cc: linux-arch@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit 2f275de5d1)

Bug: 119769499
Change-Id: I96876ecd8d1743c289ecef6d2deb65361d1f5baa
[ghackmann@google.com: drop changes to parisc, tile, and um, which
 didn't implement seccomp support in this kernel version]
Signed-off-by: Greg Hackmann <ghackmann@google.com>

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:46:03 +05:30
Andy Lutomirski
6b2efe86a1 BACKPORT: x86/entry/64: Always run ptregs-using syscalls on the slow path
64-bit syscalls currently have an optimization in which they are
called with partial pt_regs.  A small handful require full
pt_regs.

In the 32-bit and compat cases, I cleaned this up by forcing
full pt_regs for all syscalls.  The performance hit doesn't
really matter as the affected system calls are fundamentally
heavy and this is the 32-bit compat case.

I want to clean up the 64-bit case as well, but I don't want to
hurt fast path performance.  To do that, I want to force the
syscalls that use pt_regs onto the slow path.  This will enable
us to make slow path syscalls be real ABI-compliant C functions.

Use the new syscall entry qualification machinery for this.
'stub_clone' is now 'stub_clone/ptregs'.

The next patch will eliminate the stubs, and we'll just have
'sys_clone/ptregs'.

As of this patch, two-phase entry tracing is no longer used.  It
has served its purpose (namely a huge speedup on some workloads
prior to more general opportunistic SYSRET support), and once
the dust settles I'll send patches to back it out.

The implementation is heavily based on a patch from Brian Gerst:

  http://lkml.kernel.org/g/1449666173-15366-1-git-send-email-brgerst@gmail.com

Originally-From: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b9beda88460bcefec6e7d792bd44eca9b760b0c4.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 302f5b260c)

Bug: 119769499
Change-Id: I3e5ac760ef9ca8dcecd8075564118bd10a8be91f
[ghackmann@google.com: adjust context]
Signed-off-by: Greg Hackmann <ghackmann@google.com>

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:45:54 +05:30
Andy Lutomirski
42c427d727 UPSTREAM: x86/syscalls: Add syscall entry qualifiers
This will let us specify something like 'sys_xyz/foo' instead of
'sys_xyz' in the syscall table, where the 'foo' qualifier conveys
some extra information to the C code.

The intent is to allow things like sys_execve/ptregs to indicate
that sys_execve() touches pt_regs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2de06e33dce62556b3ec662006fcb295504e296e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit cfcbadb49d)

Bug: 119769499
Change-Id: I39c3b052526991d7958861712f1e3e9bf453225e
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:45:45 +05:30
Andy Lutomirski
db6bbaf0e6 UPSTREAM: x86/syscalls: Move compat syscall entry handling into syscalltbl.sh
Rather than duplicating the compat entry handling in all
consumers of syscalls_BITS.h, handle it directly in
syscalltbl.sh.  Now we generate entries in syscalls_32.h like:

__SYSCALL_I386(5, sys_open)
__SYSCALL_I386(5, compat_sys_open)

and all of its consumers implicitly get the right entry point.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b7c2b501dc0e6e43050e916b95807c3e2e16e9bb.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3e65654e3d)

Bug: 119769499
Change-Id: I7b2b8206f243e33458fe6cc69affe043aaf177ce
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:45:35 +05:30
Andy Lutomirski
30bff6461c UPSTREAM: x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
The common/64/x32 distinction has no effect other than
determining which kernels actually support the syscall.  Move
the logic into syscalltbl.sh.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 32324ce15e)

Bug: 119769499
Change-Id: Ib994586ac47f8f4cbc3f746492c2b47b22e03d39
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:45:26 +05:30
Andy Lutomirski
298668fe9d UPSTREAM: x86/syscalls: Refactor syscalltbl.sh
This splits out the code to emit a syscall line.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1bfcbba991f5cfaa9291ff950a593daa972a205f.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit fba324744b)

Bug: 119769499
Change-Id: Ie36f49882c4c3a69d87288795e4525353bb05ec5
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:45:16 +05:30
Peter Kalauskas
41ee54318f ANDROID: zram: set comp_len to PAGE_SIZE when page is huge
This bug was introduced when two patches were applied out of order.

* zram: drop max_zpage_size and use zs_huge_class_size()
* zram: mark incompressible page as ZRAM_HUGE

Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 119260394
Change-Id: I437d35c8d23c15237ad9c2d5bd7f99d7bff42872
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:42:30 +05:30
Benedict Wong
2734aadd19 BACKPORT: xfrm: Allow Output Mark to be Updated Using UPDSA
Allow UPDSA to change "output mark" to permit
policy separation of packet routing decisions from
SA keying in systems that use mark-based routing.

The set mark, used as a routing and firewall mark
for outbound packets, is made update-able which
allows routing decisions to be handled independently
of keying/SA creation. To maintain consistency with
other optional attributes, the output mark is only
updated if sent with a non-zero value.

The per-SA lock and the xfrm_state_lock are taken in
that order to avoid a deadlock with
xfrm_timer_handler(), which also takes the locks in
that order.

Signed-off-by: Nathan Harold <nharold@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

(cherry picked from commit 6d8e85ffe1)
Backport resolution required using props.output_mark instead of
props.smark

Change-Id: I08c7bfc114ac9826a8a18f5ac1c3ff17a4e0940b
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 114060045
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:42:16 +05:30
Daniel Rosenberg
da81903e96 ANDROID: sdcardfs: Add option to drop unused dentries
This adds the nocache mount option, which will cause sdcardfs to always
drop dentries that are not in use, preventing cached entries from
holding on to lower dentries, which could  cause strange behavior when
bypassing the sdcardfs layer and directly changing the lower fs.

Change-Id: I70268584a20b989ae8cfdd278a2e4fa1605217fb
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:41:47 +05:30
Amit Pundir
fbbf1a527a Merge 4.20-rc1-4.4 into android-4.4
* origin/upstream-f2fs-stable-linux-4.4.y:
  f2fs: guarantee journalled quota data by checkpoint
  f2fs: cleanup dirty pages if recover failed
  f2fs: fix data corruption issue with hardware encryption
  f2fs: fix to recover inode->i_flags of inode block during POR
  f2fs: spread f2fs_set_inode_flags()
  f2fs: fix to spread clear_cold_data()
  Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()"
  f2fs: account read IOs and use IO counts for is_idle
  f2fs: fix to account IO correctly for cgroup writeback
  f2fs: fix to account IO correctly
  f2fs: remove request_list check in is_idle()
  f2fs: allow to mount, if quota is failed
  f2fs: update REQ_TIME in f2fs_cross_rename()
  f2fs: do not update REQ_TIME in case of error conditions
  f2fs: remove unneeded disable_nat_bits()
  f2fs: remove unused sbi->trigger_ssr_threshold
  f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
  f2fs: fix to recover cold bit of inode block during POR
  f2fs: submit cached bio to avoid endless PageWriteback
  f2fs: checkpoint disabling
  ...

 Conflicts:
	fs/f2fs/data.c

Change-Id: I95097a969bbd23c2009106b07be8a1eeec675b1c
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:39:12 +05:30
Alistair Strachan
f111b61338 Revert "BACKPORT, FROMLIST: fscrypt: add Speck128/256 support"
This reverts commit eafbcb2354.

Resolves conflicts with a later CL, e7724207f7 "fscrypt: log the
crypto algorithm implementations".

[astrachan: This is an update to 7977ade69d which reverted only
            the fscrypt changes, not the ext4 changes, due to a
            historical bad merge.]
Bug: 116008047
Change-Id: I772d78d6e4bd126dcd2b25049c719f8522a1c157
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:34:12 +05:30
Alistair Strachan
00428ff5a2 Build fix for 7977ade69d.
Bug: 116008047
Change-Id: I053866f49e1ec090fde9a6fbf19a34c5e4e6e8e3
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:33:50 +05:30
Alistair Strachan
4b20ec3e75 Revert "BACKPORT, FROMGIT: crypto: speck - add support for the Speck block cipher"
This reverts commit 18954d93f3.

Bug: 116008047
Change-Id: If9192b30cdb4212fb6c8111d70c532a109695fbd
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:33:27 +05:30
Alistair Strachan
881ff5a2fd Revert "FROMGIT: crypto: speck - export common helpers"
This reverts commit bc84402781.

Bug: 116008047
Change-Id: I9d0a8357be1ab090a793646716771015299fb7fe
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:32:50 +05:30
Alistair Strachan
3c95c088da Revert "BACKPORT, FROMGIT: crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS"
This reverts commit 8fce82e266.

Bug: 116008047
Change-Id: I3c76a77dfae5894b46e39266f9f8d7c7294ab615
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:32:11 +05:30
Alistair Strachan
87174b9852 Revert "BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck128-XTS"
This reverts commit 239fd75203.

Bug: 116008047
Change-Id: Id68b333c66c89bc69e61bdd96d21fa6bc6dbf3b8
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:31:38 +05:30
Alistair Strachan
cddb52b89b Revert "BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck64-XTS"
This reverts commit c21d7a0568.

Bug: 116008047
Change-Id: I96897223f5daced94e3d62ae817c2de2b59b417f
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:30:39 +05:30
Alistair Strachan
eb75269732 Revert "BACKPORT, FROMLIST: crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS"
This reverts commit cf055c4ac0.

Bug: 116008047
Change-Id: Ic47509910c162a35f6fba10a196f4369299451ac
Signed-off-by: Alistair Strachan <astrachan@google.com>
[AmitP: Updated commit message for correct SHA1 hash]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:26:24 +05:30
Alistair Strachan
7977ade69d Revert "fscrypt: add Speck128/256 support"
This reverts commit eb13e0b692.

Resolves conflicts with a later CL, e7724207f7 "fscrypt: log the
crypto algorithm implementations".

Also leaves the include/uapi/linux/fs.h constants in place to prevent
future accidental re-use.

Bug: 116008047
Change-Id: I2d64d8d3e384400b7bdfc06a353c3844d4ebb377
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 23:21:29 +05:30
Evan Green
697a80ff00 UPSTREAM: loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl
This change adds LOOP_SET_BLOCK_SIZE as one of the supported ioctls
in lo_compat_ioctl. It only takes an unsigned long argument, and
in practice a 32-bit value works fine.

Bug: 117823094
Change-Id: I0061a082eb2632c47b7d66f35f2c909d33ff1653
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 9fea4b3952)
Signed-off-by: Martijn Coenen <maco@android.com>
(cherry picked from commit c82807c7dd)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 22:31:55 +05:30
Shaohua Li
3c24e1d194 BACKPORT: block/loop: set hw_sectors
Loop can handle any size of request. Limiting it to 255 sectors just
burns the CPU for bio split and request merge for underlayer disk and
also cause bad fs block allocation in directio mode.

Bug: 117823094
Change-Id: Ic4957181433c5a0d15f4cfdbf69dc5558d6dc5bd
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 54bb0ade66)
Signed-off-by: Martijn Coenen <maco@android.com>
(cherry picked from commit 8567ea359c)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 22:31:43 +05:30
Omar Sandoval
3881718492 UPSTREAM: loop: add ioctl for changing logical block size
This is a different approach from the first attempt in f2c6df7dbf
("loop: support 4k physical blocksize"). Rather than extending
LOOP_{GET,SET}_STATUS, add a separate ioctl just for setting the block
size.

Bug: 117823094
Change-Id: I8e69b8839d7fee3be564cbfce1797ce108e1aa1e
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 89e4fdecb5)
Signed-off-by: Martijn Coenen <maco@android.com>
(cherry picked from commit 6edf1ad773)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 22:31:33 +05:30
Jerry Zhang
afe4c8d03b ANDROID: usb: gadget: f_mtp: Return error if count is negative
If the user passes in a negative file size in a int64,
this will compare to be smaller than buffer length,
and it will get truncated to form a read length that
is larger than the buffer length.

To fix, return -EINVAL if the count argument is negative,
so the loop will never happen.

Bug: 37429972
Test: Test with PoC
Change-Id: I5d52e38e6fbe2c17eb8c493f9eb81df6cfd780a4
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
(cherry picked from commit 34e65b671b)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 22:30:28 +05:30
Steve Muckle
8f2aa58a67 ANDROID: x86_64_cuttlefish_defconfig: disable CONFIG_MEMORY_STATE_TIME
Bug: 117847156
Change-Id: Idfbac9c1f0dc2617642c30ddb65400083da44b49
Signed-off-by: Steve Muckle <smuckle@google.com>
(cherry picked from commit 7a95540418)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-10 22:20:45 +05:30
Chao Yu
0eea3276ee f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:25 -07:00
Sheng Yong
ba406b6308 f2fs: cleanup dirty pages if recover failed
During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546174] Modules linked in:
[   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.546242] Call Trace:
[   53.546248]  f2fs_submit_page_bio+0x95/0x740
[   53.546253]  read_node_page+0x161/0x1e0
[   53.546271]  ? truncate_node+0x650/0x650
[   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
[   53.546288]  ? pagecache_get_page+0x262/0x2d0
[   53.546292]  __get_node_page+0x200/0x660
[   53.546302]  f2fs_update_inode_page+0x4a/0x160
[   53.546306]  f2fs_write_inode+0x86/0xb0
[   53.546317]  __writeback_single_inode+0x49c/0x620
[   53.546322]  writeback_single_inode+0xe4/0x1e0
[   53.546326]  sync_inode_metadata+0x93/0xd0
[   53.546330]  ? sync_inode+0x10/0x10
[   53.546342]  ? do_raw_spin_unlock+0xed/0x100
[   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
[   53.546351]  f2fs_fill_super+0x287d/0x2d10
[   53.546367]  ? vsnprintf+0x742/0x7a0
[   53.546372]  ? f2fs_commit_super+0x180/0x180
[   53.546379]  ? up_write+0x20/0x40
[   53.546385]  ? set_blocksize+0x5f/0x140
[   53.546391]  ? f2fs_commit_super+0x180/0x180
[   53.546402]  mount_bdev+0x181/0x200
[   53.546406]  mount_fs+0x94/0x180
[   53.546411]  vfs_kern_mount+0x6c/0x1e0
[   53.546415]  do_mount+0xe5e/0x1510
[   53.546420]  ? fs_reclaim_release+0x9/0x30
[   53.546424]  ? copy_mount_string+0x20/0x20
[   53.546428]  ? fs_reclaim_acquire+0xd/0x30
[   53.546435]  ? __might_sleep+0x2c/0xc0
[   53.546440]  ? ___might_sleep+0x53/0x170
[   53.546453]  ? __might_fault+0x4c/0x60
[   53.546468]  ? _copy_from_user+0x95/0xa0
[   53.546474]  ? memdup_user+0x39/0x60
[   53.546478]  ksys_mount+0x88/0xb0
[   53.546482]  __x64_sys_mount+0x5d/0x70
[   53.546495]  do_syscall_64+0x65/0x130
[   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:25 -07:00
Sahitya Tummala
f8a7408381 f2fs: fix data corruption issue with hardware encryption
Direct IO can be used in case of hardware encryption. The following
scenario results into data corruption issue in this path -

Thread A -                          Thread B-
-> write file#1 in direct IO
                                    -> GC gets kicked in
                                    -> GC submitted bio on meta mapping
				       for file#1, but pending completion
-> write file#1 again with new data
   in direct IO
                                    -> GC bio gets completed now
                                    -> GC writes old data to the new
                                       location and thus file#1 is
				       corrupted.

Fix this by submitting and waiting for pending io on meta mapping
for direct IO case in f2fs_map_blocks().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:25 -07:00
Chao Yu
797d94b07e f2fs: fix to recover inode->i_flags of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +a /mnt/f2fs/file
6. xfs_io -a /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. xfs_io /mnt/f2fs/file

There is no error when opening this file w/o O_APPEND, but actually,
we expect the correct result should be:

/mnt/f2fs/file: Operation not permitted

The root cause is, in recover_inode(), we recover inode->i_flags more
than F2FS_I(inode)->i_flags, so fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:24 -07:00
Chao Yu
6a23a85581 f2fs: spread f2fs_set_inode_flags()
This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:24 -07:00
Chao Yu
fbfc2e102c f2fs: fix to spread clear_cold_data()
We need to drop PG_checked flag on page as well when we clear PG_uptodate
flag, in order to avoid treating the page as GCing one later.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:24 -07:00
Jaegeuk Kim
0246fe9d2e Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()"
This reverts commit 66110abc4c.

If we clear the cold data flag out of the writeback flow, we can miscount
-1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
heavy GC.

Balancing F2FS Async:
 - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...

GC thread:                              IRQ
- move_data_page()
 - set_page_dirty()
  - clear_cold_data()
                                        - f2fs_write_end_io()
                                         - type = WB_DATA_TYPE(page);
                                           here, we get wrong type
                                         - dec_page_count(sbi, type);
 - f2fs_wait_on_page_writeback()

Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:24 -07:00
Jaegeuk Kim
003e915c01 f2fs: account read IOs and use IO counts for is_idle
This patch adds issued read IO counts which is under block layer.

Chao modified a bit, since:

Below race can cause reversed reference on F2FS_RD_DATA, there is
the same issue in f2fs_submit_page_bio(), fix them by relocate
__submit_bio() and inc_page_count.

Thread A			Thread B
- f2fs_write_begin
 - f2fs_submit_page_read
 - __submit_bio
				- f2fs_read_end_io
				 - __read_end_io
				 - dec_page_count(, F2FS_RD_DATA)
 - inc_page_count(, F2FS_RD_DATA)

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:24 -07:00
Chao Yu
9204be8a48 f2fs: fix to account IO correctly for cgroup writeback
Now, we have supported cgroup writeback, it depends on correctly IO
account of specified filesystem.

But in commit d1b3e72d54 ("f2fs: submit bio of in-place-update pages"),
we split write paths from f2fs_submit_page_mbio() to two:
- f2fs_submit_page_bio() for IPU path
- f2fs_submit_page_bio() for OPU path

But still we account write IO only in f2fs_submit_page_mbio(), result in
incorrect IO account, fix it by adding missing IO account in IPU path.

Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Chao Yu
fb363c5db4 f2fs: fix to account IO correctly
Below race can cause reversed reference on dirty count, fix it by
relocating __submit_bio() and inc_page_count().

Thread A				Thread B
- f2fs_inplace_write_data
 - f2fs_submit_page_bio
  - __submit_bio
					- f2fs_write_end_io
					 - dec_page_count
  - inc_page_count

Cc: <stable@vger.kernel.org>
Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Jens Axboe
c4b87d0c1c f2fs: remove request_list check in is_idle()
This doesn't work on stacked devices, and it doesn't work on
blk-mq devices. The request_list is only used on legacy, which
we don't have much of anymore, and soon won't have any of.

Kill the check.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Jaegeuk Kim
05d4dcf63d f2fs: allow to mount, if quota is failed
Since we can use the filesystem without quotas till next boot.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Sahitya Tummala
fe2b3bc0fc f2fs: update REQ_TIME in f2fs_cross_rename()
Update REQ_TIME in the missing path - f2fs_cross_rename().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
[Jaegeuk Kim: add it in f2fs_rename()]
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Sahitya Tummala
4aa5ef7fb1 f2fs: do not update REQ_TIME in case of error conditions
The REQ_TIME should be updated only in case of success cases
as followed at all other places in the file system.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:23 -07:00
Chao Yu
2dc22029a7 f2fs: remove unneeded disable_nat_bits()
Commit 7735730d39 ("f2fs: fix to propagate error from __get_meta_page()")
added disable_nat_bits() in error path of __get_nat_bitmaps(), but it's
unneeded, beause we will fail mount, we won't have chance to change nid
usage status w/o nat full/empty bitmaps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:22 -07:00
Chao Yu
3dd704fff9 f2fs: remove unused sbi->trigger_ssr_threshold
Commit a2a12b679f ("f2fs: export SSR allocation threshold") introduced
two threshold .min_ssr_sections and .trigger_ssr_threshold, but only
.min_ssr_sections is used, so just remove redundant one for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:22 -07:00
Chao Yu
9f2917f2bb f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
file_set_{cold,hot} doesn't need holding sbi->sb_lock, so moving them
out of the lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:22 -07:00
Chao Yu
3296764733 f2fs: fix to recover cold bit of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. chattr -A /mnt/f2fs/file
11. xfs_io -f /mnt/f2fs/file -c "fsync"
12. umount /mnt/f2fs
13. mount -t f2fs /dev/sdd /mnt/f2fs
14. lsattr /mnt/f2fs/file

-----------------N- /mnt/f2fs/file

But actually, we expect the corrct result is:

-------A---------N- /mnt/f2fs/file

The reason is in step 9) we missed to recover cold bit flag in inode
block, so later, in fsync, we will skip write inode block due to below
condition check, result in lossing data in another SPOR.

f2fs_fsync_node_pages()
	if (!IS_DNODE(page) || !is_cold_node(page))
		continue;

Note that, I guess that some non-dir inode has already lost cold bit
during POR, so in order to reenable recovery for those inode, let's
try to recover cold bit in f2fs_iget() to save more fsynced data.

Fixes: c56675750d ("f2fs: remove unneeded set_cold_node()")
Cc: <stable@vger.kernel.org> 4.17+
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:22 -07:00
Chao Yu
4bb9f775d5 f2fs: submit cached bio to avoid endless PageWriteback
When migrating encrypted block from background GC thread, we only add
them into f2fs inner bio cache, but forget to submit the cached bio, it
may cause potential deadlock when we are waiting page writebacked, fix
it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-29 18:46:22 -07:00