Commit Graph

80283 Commits

Author SHA1 Message Date
Mark Rutland
63e2298591 UPSTREAM: kasan: add functions to clear stack poison
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from e3ae116339)
Change-Id: I9be31b714d5bdaec94a2dad3f0e468c094fe5fa2
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2018-01-22 13:15:43 +05:30
John Stultz
aafcb0f503 BACKPORT: time: Clean up CLOCK_MONOTONIC_RAW time handling
(cherry pick from commit fc6eead7c1)

Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Tested-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 20045882
Bug: 63737556
Change-Id: I243827d21b08703a09d2d2fe738a9258be224582
2018-01-22 13:15:43 +05:30
John Stultz
581a337227 BACKPORT: time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
(cherry pick from commit 3d88d56c58)

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "stable #4 . 8+" <stable@vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bug: 20045882
Bug: 63737556
Change-Id: I6c55dd7685f6bd212c6af9d09c527528e1dd5fa1
2018-01-22 13:15:43 +05:30
Jaegeuk Kim
6522a6d452 f2fs: updates on 4.15-rc1
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we introduce sysfile-based quota support which is
  required for Android by default. In addition, we allow that users are
  able to reserve some blocks in runtime to mitigate performance drops
  in low free space.

 Enhancements:
  - assign proper data segments according to write_hints given by user
  - issue cache_flush on dirty devices only among multiple devices
  - exploit cp_error flag and add more faults to enhance fault
    injection test
  - conduct more readaheads during f2fs_readdir
  - add a range for discard commands

Bug fixes:
 - fix zero stat->st_blocks when inline_data is set
 - drop crypto key and free stale memory pointer while evict_inode is
   failing
 - fix some corner cases in free space and segment management
 - fix wrong last_disk_size

This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"

Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:

56a07b0705 f2fs: deny accessing encryption policy if encryption is off
c394842e26 f2fs: inject fault in inc_valid_node_count
9262922510 f2fs: fix to clear FI_NO_PREALLOC
e6cfc5de2d f2fs: expose quota information in debugfs
c4cd2efe83 f2fs: separate nat entry mem alloc from nat_tree_lock
48c72b4c8c f2fs: validate before set/clear free nat bitmap
baf9275a4b f2fs: avoid opened loop codes in __add_ino_entry
47af6c72d9 f2fs: apply write hints to select the type of segments for buffered write
ac98191605 f2fs: introduce scan_curseg_cache for cleanup
ca28e9670e f2fs: optimize the way of traversing free_nid_bitmap
460688b59e f2fs: keep scanning until enough free nids are acquired
0186182c0c f2fs: trace checkpoint reason in fsync()
5d4b6efcfd f2fs: keep isize once block is reserved cross EOF
3c8f767e13 f2fs: avoid race in between GC and block exchange
4423778adf f2fs: save a multiplication for last_nid calculation
3e3b405575 f2fs: fix summary info corruption
44889e4879 f2fs: remove dead code in update_meta_page
55c7b9595b f2fs: remove unneeded semicolon
8b92814117 f2fs: don't bother with inode->i_version
42c7c71824 f2fs: check curseg space before foreground GC
c5470498e5 f2fs: use rw_semaphore to protect SIT cache
82750d346a f2fs: support quota sys files
26dfec49b2 f2fs: add quota_ino feature infra
ddb8e2ae98 f2fs: optimize __update_nat_bits
f46ae958c7 f2fs: modify for accurate fggc node io stat
c713fdb5a2 Revert "f2fs: handle dirty segments inside refresh_sit_entry"
873ec505cb f2fs: add a function to move nid
ae66786296 f2fs: export SSR allocation threshold
90c28a18d2 f2fs: give correct trimmed blocks in fstrim
5612922fb0 f2fs: support bio allocation error injection
583b7a274c f2fs: support get_page error injection
09a073cc8c f2fs: add missing sysfs description
e945474a9c f2fs: support soft block reservation
b7b2e629b6 f2fs: handle error case when adding xattr entry
7368e30495 f2fs: support flexible inline xattr size
ada4061e19 f2fs: show current cp state
5b8ff1301a f2fs: add missing quota_initialize
46d4a691f0 f2fs: show # of dirty segments via sysfs
fc13f9d7ce f2fs: stop all the operations by cp_error flag
91bea0c391 f2fs: remove several redundant assignments
807486c795 f2fs: avoid using timespec
03b1cb0bb4 f2fs: fix to correct no_fggc_candidate
5c15033cea Revert "f2fs: return wrong error number on f2fs_quota_write"
5f5f593222 f2fs: remove obsolete pointer for truncate_xattr_node
032a690682 f2fs: retry ENOMEM for quota_read|write
171b638fc4 f2fs: limit # of inmemory pages
83ed7a615f f2fs: update ctx->pos correctly when hitting hole in directory
4d6e68be25 f2fs: relocate readahead codes in readdir()
c8be47b540 f2fs: allow readdir() to be interrupted
2b903fe94c f2fs: trace f2fs_readdir
bb0db666d4 f2fs: trace f2fs_lookup
40d6250f04 f2fs: skip searching non-exist range in truncate_hole
8e84f379df f2fs: expose some sectors to user in inline data or dentry case
cb98f70dea f2fs: avoid stale fi->gdirty_list pointer
5562a3c539 f2fs/crypto: drop crypto key at evict_inode only
85853e7e38 f2fs: fix to avoid race when accessing last_disk_size
0c47a892d5 f2fs: Fix bool initialization/comparison
68e801abc5 f2fs: give up CP_TRIMMED_FLAG if it drops discards
df74eacb20 f2fs: trace f2fs_remove_discard
bd502c6e3e f2fs: reduce cmd_lock coverage in __issue_discard_cmd
a34ab5ca4f f2fs: split discard policy
1e65afd14d f2fs: wrap discard policy
684447dad1 f2fs: support issuing/waiting discard in range
27eaad0938 f2fs: fix to flush multiple device in checkpoint
08bb9d68d5 f2fs: enhance multiple device flush
9c2526ac2e f2fs: fix to show ino management cache size correctly
814b463d26 f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
f555b0a117 f2fs: obsolete ALLOC_NID_LIST list
75d3164ae1 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
4de0ceb6b7 f2fs: allow readpages with NULL file pointer
322a45d172 f2fs: show flush list status in sysfs
6d625a93b4 f2fs: introduce read_xattr_block
8ea6e1c327 f2fs: introduce read_inline_xattr
dbce11e9ee Revert "f2fs: reuse nids more aggressively"
131bc9f6b7 Revert "f2fs: node segment is prior to data segment selected victim"

Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-01-22 13:15:43 +05:30
Laura Abbott
5c42cda2b4 UPSTREAM: mm: Introduce lm_alias
(cherry-pick from commit 568c5fe5a5)

Certain architectures may have the kernel image mapped separately to
alias the linear map. Introduce a macro lm_alias to translate a kernel
image symbol into its linear alias. This is used in part with work to
add CONFIG_DEBUG_VIRTUAL support for arm64.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug: 20045882
Bug: 63737556
Change-Id: I93ba4ef8df77f7e52717ab6e208d06122ac3a72b
2018-01-22 13:15:43 +05:30
Amit Pundir
395ae9f4e6 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

Conflicts:
    kernel/fork.c
        Conflict due to Kaiser implementation in LTS 4.4.110.
    net/ipv4/raw.c
        Minor conflict due to LTS commit
        be27b620a8 ("net: ipv4: fix for a race condition in raw_sendmsg")
2018-01-22 11:50:22 +05:30
Alex Shi
13b48cd3ac Merge tag 'v4.4.112' into linux-linaro-lsk-v4.4
This is the 4.4.112 stable release
2018-01-18 12:02:00 +08:00
Thomas Gleixner
73492b6860 sysfs/cpu: Add vulnerability folder
commit 87590ce6e3 upstream.

As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.

Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.

Allow architectures to override the show function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:33 +01:00
Nicholas Bellinger
531a2595ce target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
commit 1c21a48055 upstream.

This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.

Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.

This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed.  As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.

To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.

Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Alexei Starovoitov
9a7fad4c0e bpf: prevent out-of-bounds speculation
commit b2157399cc upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:31 +01:00
Daniel Borkmann
087a92287d bpf: add bpf_patch_insn_single helper
commit c237ee5eb3 upstream.

Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:30 +01:00
Jerome Brunet
b188ce30cc net: stmmac: enable EEE in MII, GMII or RGMII only
[ Upstream commit 879626e3a5 ]

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6bb ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:29 +01:00
Sergei Shtylyov
f574e2e8c6 sh_eth: fix SH7757 GEther initialization
[ Upstream commit 5133550296 ]

Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
'sh_eth' driver can only reset and initialize TSU of the first controller
pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
'struct sh_eth_plat_data', however the platform code still never sets this
flag. I think  that we can infer this information from the 'devno' variable
(set  to 'platform_device::id') and reset/init the Ether controller pair
only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
removed...

Fixes: 150647fb2c ("net: sh_eth: change the condition of initialization")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:29 +01:00
Wanpeng Li
eb91461daa KVM: Fix stack-out-of-bounds read in write_mmio
commit e39d200fa5 upstream.

Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:24 +01:00
Alex Shi
921884d92b Merge tag 'v4.4.111' into linux-linaro-lsk-v4.4
This is the 4.4.111 stable release
2018-01-11 12:01:16 +08:00
Thiago Rafael Becker
58330ec2fe kernel: make groups_sort calling a responsibility group_info allocators
commit bdcf0a423e upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
David Howells
d1698dc8a5 fscache: Fix the default for fscache_maybe_release_page()
commit 9880150655 upstream.

Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.

Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:27:10 +01:00
Alex Shi
9c95ff4010 Merge tag 'v4.4.110' into linux-linaro-lsk-v4.4
This is the 4.4.110 stable release
2018-01-08 12:01:03 +08:00
Kees Cook
3e1457d6bf KPTI: Rename to PAGE_TABLE_ISOLATION
This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Hugh Dickins
3e3d38fd98 kaiser: vmstat show NR_KAISERTABLE as nr_overhead
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
c52e55a2a8 kaiser: cleanups while trying for gold link
While trying to get our gold link to work, four cleanups:
matched the gdt_page declaration to its definition;
in fiddling unsuccessfully with PERCPU_INPUT(), lined up backslashes;
lined up the backslashes according to convention in percpu-defs.h;
deleted the unused irq_stack_pointer addition to irq_stack_union.

Sad to report that aligning backslashes does not appear to help gold
align to 8192: but while these did not help, they are worth keeping.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
003e476716 kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Dave Hansen
bed9bb7f3e kaiser: merged update
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Richard Fellner
8a43ddfb93 KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Alex Shi
95d4cbea02 Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	early call high_memory in arch/arm64/mm/init.c
2018-01-05 14:45:31 +08:00
Andy Lutomirski
5d67dbef74 mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
commit 5dd0b16cda upstream.

This fixes CONFIG_SMP=n, CONFIG_DEBUG_TLBFLUSH=y without introducing
further #ifdef soup.  Caught by a Kbuild bot randconfig build.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce4a4e565f ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")
Link: http://lkml.kernel.org/r/76da9a3cc4415996f2ad2c905b93414add322021.1496673616.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Shaohua Li
e0bdd21a86 net: reevalulate autoflowlabel setting after sysctl setting
[ Upstream commit 513674b5a2 ]

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Eric Dumazet
3d406a18bd ipv4: igmp: guard against silly MTU values
[ Upstream commit b5476022bb ]

IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Sebastian Andrzej Siewior
2e234e707f crypto: mcryptd - protect the per-CPU queue with a lock
commit 9abffc6f2e upstream.

mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.

If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.

Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:19 +01:00
Andy Lutomirski
425f13a366 sched/core: Add switch_mm_irqs_off() and use it in the scheduler
commit f98db6013c upstream.

By default, this is the same thing as switch_mm().

x86 will override it as an optimization.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Alex Shi
317c5652a7 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-12-21 12:06:53 +08:00
Alex Shi
1b29aa698d Merge tag 'v4.4.107' into linux-linaro-lsk-v4.4
This is the 4.4.107 stable release
2017-12-21 12:06:47 +08:00
Jan Kara
b59614cfd2 mm: Handle 0 flags in _calc_vm_trans() macro
[ Upstream commit 592e254502 ]

_calc_vm_trans() does not handle the situation when some of the passed
flags are 0 (which can happen if these VM flags do not make sense for
the architecture). Improve the _calc_vm_trans() macro to return 0 in
such situation. Since all passed flags are constant, this does not add
any runtime overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:59 +01:00
Mike Christie
dfd6deed83 target: fix ALUA transition timeout handling
[ Upstream commit d7175373f2 ]

The implicit transition time tells initiators the min time
to wait before timing out a transition. We currently schedule
the transition to occur in tg_pt_gp_implicit_trans_secs
seconds so there is no room for delays. If
core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
needs to write out info to a remote file, then the initiator can
easily time out the operation.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:57 +01:00
Jack Morgenstein
acc7d1bd90 net/mlx4_core: Avoid delays during VF driver device shutdown
[ Upstream commit 4cbe4dac82 ]

Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.

In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.

For such Hypervisors, there is a race condition between the VF's
shutdown method and its internal-error detection/reset thread.

The internal-error detection/reset thread (which runs every 5 seconds) also
detects a disabled comm channel. If the internal-error detection/reset
flow wins the race, we still get delays (while that flow tries repeatedly
to detect comm-channel recovery).

The cited commit fixed the command timeout problem when the
internal-error detection/reset flow loses the race.

This commit avoids the unneeded delays when the internal-error
detection/reset flow wins.

Fixes: d585df1c5c ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:56 +01:00
Eric Biggers
43cd7f3861 crypto: hmac - require that the underlying hash algorithm is unkeyed
commit af3ff8045b upstream.

Because the HMAC template didn't check that its underlying hash
algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))"
through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC
being used without having been keyed, resulting in sha3_update() being
called without sha3_init(), causing a stack buffer overflow.

This is a very old bug, but it seems to have only started causing real
problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3)
because the innermost hash's state is ->import()ed from a zeroed buffer,
and it just so happens that other hash algorithms are fine with that,
but SHA-3 is not.  However, there could be arch or hardware-dependent
hash algorithms also affected; I couldn't test everything.

Fix the bug by introducing a function crypto_shash_alg_has_setkey()
which tests whether a shash algorithm is keyed.  Then update the HMAC
template to require that its underlying hash algorithm is unkeyed.

Here is a reproducer:

    #include <linux/if_alg.h>
    #include <sys/socket.h>

    int main()
    {
        int algfd;
        struct sockaddr_alg addr = {
            .salg_type = "hash",
            .salg_name = "hmac(hmac(sha3-512-generic))",
        };
        char key[4096] = { 0 };

        algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(algfd, (const struct sockaddr *)&addr, sizeof(addr));
        setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
    }

Here was the KASAN report from syzbot:

    BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341  [inline]
    BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0  crypto/sha3_generic.c:161
    Write of size 4096 at addr ffff8801cca07c40 by task syzkaller076574/3044

    CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:17 [inline]
      dump_stack+0x194/0x257 lib/dump_stack.c:53
      print_address_description+0x73/0x250 mm/kasan/report.c:252
      kasan_report_error mm/kasan/report.c:351 [inline]
      kasan_report+0x25b/0x340 mm/kasan/report.c:409
      check_memory_region_inline mm/kasan/kasan.c:260 [inline]
      check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
      memcpy+0x37/0x50 mm/kasan/kasan.c:303
      memcpy include/linux/string.h:341 [inline]
      sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
      crypto_shash_update+0xcb/0x220 crypto/shash.c:109
      shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151
      crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
      hmac_finup+0x182/0x330 crypto/hmac.c:152
      crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
      shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172
      crypto_shash_digest+0xc4/0x120 crypto/shash.c:186
      hmac_setkey+0x36a/0x690 crypto/hmac.c:66
      crypto_shash_setkey+0xad/0x190 crypto/shash.c:64
      shash_async_setkey+0x47/0x60 crypto/shash.c:207
      crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200
      hash_setkey+0x40/0x90 crypto/algif_hash.c:446
      alg_setkey crypto/af_alg.c:221 [inline]
      alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254
      SYSC_setsockopt net/socket.c:1851 [inline]
      SyS_setsockopt+0x189/0x360 net/socket.c:1830
      entry_SYSCALL_64_fastpath+0x1f/0x96

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:51 +01:00
Alex Shi
2cea360fea Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-12-19 13:08:40 +08:00
Alex Shi
51f34b615f Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	use a5fa9efe4e arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
	in arch/arm/include/asm/kvm_arm.h
2017-12-19 13:06:55 +08:00
Stephen Bates
24c98ec494 lib/genalloc.c: make the avail variable an atomic_long_t
[ Upstream commit 36a3d1dd4e ]

If the amount of resources allocated to a gen_pool exceeds 2^32 then the
avail atomic overflows and this causes problems when clients try and
borrow resources from the pool.  This is only expected to be an issue on
64 bit systems.

Add the <linux/atomic.h> header to pull in atomic_long* operations.  So
that 32 bit systems continue to use atomic32_t but 64 bit systems can
use atomic64_t.

Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Daniel Mentz <danielmentz@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:54 +01:00
Ladislav Michl
0500c6d352 ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
[ Upstream commit 7807e086a2 ]

gpmc_probe_onenand_child returns success even on gpmc_onenand_init
failure. Fix that.

Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:51 +01:00
Kirill A. Shutemov
5327f9bada mm: drop unused pmdp_huge_get_and_clear_notify()
commit c0c379e293 upstream.

Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.

Let's drop the helper.

Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:50 +01:00
Dave Gordon
74b6030355 drm: extra printk() wrapper macros
commit 30b0da8d55 upstream.

We had only DRM_INFO() and DRM_ERROR(), whereas the underlying printk()
provides several other useful intermediate levels such as NOTICE and
WARNING. So this patch fills out the set by providing both regular and
once-only macros for each of the levels INFO, NOTICE, and WARNING, using
a common underlying macro that does all the token-pasting.

DRM_ERROR is unchanged, as it's not just a printk wrapper.

v2:
    Fix whitespace, missing ## (Eric Engestrom)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: dri-devel@lists.freedesktop.org
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:49 +01:00
Greg Kroah-Hartman
481efb4c72 efi: Move some sysfs files to be read-only by root
commit af97a77bc0 upstream.

Thanks to the scripts/leaking_addresses.pl script, it was found that
some EFI values should not be readable by non-root users.

So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to
make this easier, and use it in other places at the same time.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:48 +01:00
Huacai Chen
7e0091c295 scsi: libsas: align sata_device's rps_resp on a cacheline
commit c2e8fbf908 upstream.

The rps_resp buffer in ata_device is a DMA target, but it isn't
explicitly cacheline aligned. Due to this, adjacent fields can be
overwritten with stale data from memory on non-coherent architectures.
As a result, the kernel is sometimes unable to communicate with an SATA
device behind a SAS expander.

Fix this by ensuring that the rps_resp buffer is cacheline aligned.

This issue is similar to that fixed by Commit 84bda12af3 ("libata:
align ap->sector_buf") and Commit 4ee34ea3a1 ("libata: Align
ata_device's id on a cacheline").

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:48 +01:00
Alex Shi
fa33cf591e Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-12-10 13:02:04 +08:00
Alex Shi
d879d2bd4b Merge tag 'v4.4.105' into linux-linaro-lsk-v4.4
This is the 4.4.105 stable release
2017-12-10 13:01:58 +08:00
Masakazu Mokuno
9d812f7bef USB: core: Add type-specific length check of BOS descriptors
commit 81cf4a4536 upstream.

As most of BOS descriptors are longer in length than their header
'struct usb_dev_cap_header', comparing solely with it is not sufficient
to avoid out-of-bounds access to BOS descriptors.

This patch adds descriptor type specific length check in
usb_get_bos_descriptor() to fix the issue.

Signed-off-by: Masakazu Mokuno <masakazu.mokuno@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:44 +01:00
John Youn
f044af8c1d usb: ch9: Add size macro for SSP dev cap descriptor
commit 446fa3a95d upstream.

The SuperspeedPlus Device Capability Descriptor has a variable size
depending on the number of sublink speed attributes.

This patch adds a macro to calculate that size. The macro takes one
argument, the Sublink Speed Attribute Count (SSAC) as reported by the
descriptor in bmAttributes[4:0].

See USB 3.1 9.6.2.5, Table 9-19.

Signed-off-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:44 +01:00
Mathias Nyman
ddaa1ae2eb usb: Add USB 3.1 Precision time measurement capability descriptor support
commit faee822c5a upstream.

USB 3.1 devices that support precision time measurement have an
additional PTM cabaility descriptor as part of the full BOS descriptor

Look for this descriptor while parsing the BOS descriptor, and store it in
struct usb_hub_bos if it exists.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:43 +01:00
Jan Kara
cc4add3897 mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
[ Upstream commit 0911d0041c ]

Some ->page_mkwrite handlers may return VM_FAULT_RETRY as its return
code (GFS2 or Lustre can definitely do this).  However VM_FAULT_RETRY
from ->page_mkwrite is completely unhandled by the mm code and results
in locking and writeably mapping the page which definitely is not what
the caller wanted.

Fix Lustre and block_page_mkwrite_ret() used by other filesystems
(notably GFS2) to return VM_FAULT_NOPAGE instead which results in
bailing out from the fault code, the CPU then retries the access, and we
fault again effectively doing what the handler wanted.

Link: http://lkml.kernel.org/r/20170203150729.15863-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 18:42:43 +01:00