Commit Graph

1630 Commits

Author SHA1 Message Date
Mauro (mdrjr) Ribeiro
59e87eafe3 Merge tag 'v4.9.292' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.292 stable release

Change-Id: I2ba8788797a5c36a55061dfca4c3a6cf4e656ed2
2022-04-27 16:20:06 -03:00
David Hildenbrand
33a7d698f3 proc/vmcore: fix clearing user buffer by properly using clear_user()
commit c1e6311771 upstream.

To clear a user buffer we cannot simply use memset, we have to use
clear_user().  With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":

  systemd[1]: Starting Kdump Vmcore Save Service...
  kdump[420]: Kdump is using the default log level(3).
  kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[465]: saving vmcore-dmesg.txt complete
  kdump[467]: saving vmcore
  BUG: unable to handle page fault for address: 00007f2374e01000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
  Oops: 0003 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
  RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
  Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
  RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
  RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
  RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
  RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
  R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
  R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
  FS:  00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
  Call Trace:
   read_vmcore+0x236/0x2c0
   proc_reg_read+0x55/0xa0
   vfs_read+0x95/0x190
   ksys_read+0x4f/0xc0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access.  In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().

To fix, properly use clear_user() when we're dealing with a user buffer.

Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f51 ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08 08:45:04 +01:00
Mauro (mdrjr) Ribeiro
24e57ad8b9 Merge tag 'v4.9.277' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.277 stable release

Change-Id: If6b64078940d789b507791c7bbe010b5de2765cb
2021-07-30 21:09:08 -03:00
Mauro (mdrjr) Ribeiro
5bdadab9f1 Merge tag 'v4.9.273' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.273 stable release

Change-Id: I4c404d66b978c06106bd4ae293a155b83733756e
2021-07-30 20:56:39 -03:00
Mauro (mdrjr) Ribeiro
fe479912b5 Merge tag 'v4.9.271' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.271 stable release

Change-Id: I3a31919a9500297b1d6adb076ef69b45f6b79201
2021-07-30 20:56:30 -03:00
Mauro (mdrjr) Ribeiro
577983c0ab Merge tag 'v4.9.267' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.267 stable release

Change-Id: Iee6d9a8c35574205717f88cfc7f9c89260f53f39
2021-07-30 20:48:39 -03:00
Mauro (mdrjr) Ribeiro
a93f2ac12d Merge tag 'v4.9.247' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.247 stable release

Change-Id: I8f82b355f6010f618633fcf380ddf1f1f37d103c
2021-07-30 20:13:53 -03:00
Marcelo Henrique Cerri
947644647b proc: Avoid mixing integer types in mem_rw()
[ Upstream commit d238692b4b ]

Use size_t when capping the count argument received by mem_rw(). Since
count is size_t, using min_t(int, ...) can lead to a negative value
that will later be passed to access_remote_vm(), which can cause
unexpected behavior.

Since we are capping the value to at maximum PAGE_SIZE, the conversion
from size_t to int when passing it to access_remote_vm() as "len"
shouldn't be a problem.

Link: https://lkml.kernel.org/r/20210512125215.3348316-1-marcelo.cerri@canonical.com
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Souza Cascardo <cascardo@canonical.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-28 09:14:27 +02:00
Linus Torvalds
b08854de43 proc: only require mm_struct for writing
commit 94f0b2d4a1 upstream.

Commit 591a22c14d ("proc: Track /proc/$pid/attr/ opener mm_struct") we
started using __mem_open() to track the mm_struct at open-time, so that
we could then check it for writes.

But that also ended up making the permission checks at open time much
stricter - and not just for writes, but for reads too.  And that in turn
caused a regression for at least Fedora 29, where NIC interfaces fail to
start when using NetworkManager.

Since only the write side wanted the mm_struct test, ignore any failures
by __mem_open() at open time, leaving reads unaffected.  The write()
time verification of the mm_struct pointer will then catch the failure
case because a NULL pointer will not match a valid 'current->mm'.

Link: https://lore.kernel.org/netdev/YMjTlp2FSJYvoyFa@unreal/
Fixes: 591a22c14d ("proc: Track /proc/$pid/attr/ opener mm_struct")
Reported-and-tested-by: Leon Romanovsky <leon@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 11:36:36 +02:00
Kees Cook
63ac7652ec proc: Track /proc/$pid/attr/ opener mm_struct
commit 591a22c14d upstream.

Commit bfb819ea20 ("proc: Check /proc/$pid/attr/ writes against file opener")
tried to make sure that there could not be a confusion between the opener of
a /proc/$pid/attr/ file and the writer. It used struct cred to make sure
the privileges didn't change. However, there were existing cases where a more
privileged thread was passing the opened fd to a differently privileged thread
(during container setup). Instead, use mm_struct to track whether the opener
and writer are still the same process. (This is what several other proc files
already do, though for different reasons.)

Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Fixes: bfb819ea20 ("proc: Check /proc/$pid/attr/ writes against file opener")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 11:36:32 +02:00
Kees Cook
3cfc8ca698 proc: Check /proc/$pid/attr/ writes against file opener
commit bfb819ea20 upstream.

Fix another "confused deputy" weakness[1]. Writes to /proc/$pid/attr/
files need to check the opener credentials, since these fds do not
transition state across execve(). Without this, it is possible to
trick another process (which may have different credentials) to write
to its own /proc/$pid/attr/ files, leading to unexpected and possibly
exploitable behaviors.

[1] https://www.kernel.org/doc/html/latest/security/credentials.html?highlight=confused#open-file-credentials

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 08:23:27 +02:00
Hugh Dickins
700344f145 mm: add cond_resched() in gather_pte_stats()
commit a66c0410b9 upstream.

The other pagetable walks in task_mmu.c have a cond_resched() after
walking their ptes: add a cond_resched() in gather_pte_stats() too, for
reading /proc/<id>/numa_maps.  Only pagemap_pmd_range() has a
cond_resched() in its (unusually expensive) pmd_trans_huge case: more
should probably be added, but leave them unchanged for now.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1612052157400.13021@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Chen si <cici.cs@alibaba-inc.com>
Signed-off-by: Baoyou Xie <baoyou.xie@aliyun.com>
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Zijiang Huang <zijiang.hzj@alibaba-inc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-16 11:59:09 +02:00
Jens Axboe
61cda68902 proc: don't allow async path resolution of /proc/self components
[ Upstream commit 8d4c3e76e3 ]

If this is attempted by a kthread, then return -EOPNOTSUPP as we don't
currently support that. Once we can get task_pid_ptr() doing the right
thing, then this can go away again.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02 08:31:26 +01:00
Mauro (mdrjr) Ribeiro
026b455712 Merge tag 'v4.9.228' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.228 stable release
2020-07-13 21:26:47 -03:00
Mauro (mdrjr) Ribeiro
da5525fbe6 Merge tag 'v4.9.221' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.221 stable release

Change-Id: I78dee45beadfec5998da618b58f0bf592fc346c7
2020-07-13 17:56:46 -03:00
Eric W. Biederman
5b85bf5cf3 proc: Use new_inode not new_inode_pseudo
commit ef1548adad upstream.

Recently syzbot reported that unmounting proc when there is an ongoing
inotify watch on the root directory of proc could result in a use
after free when the watch is removed after the unmount of proc
when the watcher exits.

Commit 69879c01a0 ("proc: Remove the now unnecessary internal mount
of proc") made it easier to unmount proc and allowed syzbot to see the
problem, but looking at the code it has been around for a long time.

Looking at the code the fsnotify watch should have been removed by
fsnotify_sb_delete in generic_shutdown_super.  Unfortunately the inode
was allocated with new_inode_pseudo instead of new_inode so the inode
was not on the sb->s_inodes list.  Which prevented
fsnotify_unmount_inodes from finding the inode and removing the watch
as well as made it so the "VFS: Busy inodes after unmount" warning
could not find the inodes to warn about them.

Make all of the inodes in proc visible to generic_shutdown_super,
and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
The only functional difference is that new_inode places the inodes
on the sb->s_inodes list.

I wrote a small test program and I can verify that without changes it
can trigger this issue, and by replacing new_inode_pseudo with
new_inode the issues goes away.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com
Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com
Fixes: 0097875bd4 ("proc: Implement /proc/thread-self to point at the directory of the current thread")
Fixes: 021ada7dff ("procfs: switch /proc/self away from proc_dir_entry")
Fixes: 51f0885e54 ("vfs,proc: guarantee unique inodes in /proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-20 10:24:11 +02:00
Jann Horn
f8e84d7a94 vmalloc: fix remap_vmalloc_range() bounds checks
commit bdebd6a283 upstream.

remap_vmalloc_range() has had various issues with the bounds checks it
promises to perform ("This function checks that addr is a valid
vmalloc'ed area, and that it is big enough to cover the vma") over time,
e.g.:

 - not detecting pgoff<<PAGE_SHIFT overflow

 - not detecting (pgoff<<PAGE_SHIFT)+usize overflow

 - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
   vmalloc allocation

 - comparing a potentially wildly out-of-bounds pointer with the end of
   the vmalloc region

In particular, since commit fc9702273e ("bpf: Add mmap() support for
BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
dereferences by calling mmap() on a BPF map with a size that is bigger
than the distance from the start of the BPF map to the end of the
address space.

This could theoretically be used as a kernel ASLR bypass, by using
whether mmap() with a given offset oopses or returns an error code to
perform a binary search over the possible address range.

To allow remap_vmalloc_range_partial() to verify that addr and
addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
to remap_vmalloc_range_partial() instead of adding it to the pointer in
remap_vmalloc_range().

In remap_vmalloc_range_partial(), fix the check against
get_vm_area_size() by using size comparisons instead of pointer
comparisons, and add checks for pgoff.

Fixes: 833423143c ("[PATCH] mm: introduce remap_vmalloc_range()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02 17:23:10 +02:00
Mauro (mdrjr) Ribeiro
435b02176f Merge tag 'v4.9.203' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.203 stable release
2020-04-07 21:15:12 -03:00
Mauro (mdrjr) Ribeiro
ca365bd32c Merge tag 'v4.9.188' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.188 stable release
2020-04-07 20:10:31 -03:00
Mauro (mdrjr) Ribeiro
33417c4f2f Merge tag 'v4.9.187' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.187 stable release
2020-04-07 20:10:04 -03:00
Mauro (mdrjr) Ribeiro
a0510d6c1b Merge tag 'v4.9.185' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.185 stable release
2020-04-07 20:08:40 -03:00
Mauro (mdrjr) Ribeiro
bf2f047532 Merge tag 'v4.9.172' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.172 stable release
2020-04-07 15:05:26 -03:00
Mauro (mdrjr) Ribeiro
effa03aa83 Merge tag 'v4.9.167' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.167 stable release
2020-04-07 14:57:01 -03:00
Mauro (mdrjr) Ribeiro
a210829d46 Merge tag 'v4.9.161' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.161 stable release
2020-04-07 14:55:59 -03:00
Mauro (mdrjr) Ribeiro
027a9d7211 Merge tag 'v4.9.152' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.152 stable release
2020-04-07 14:46:28 -03:00
Mauro (mdrjr) Ribeiro
f95d762cbe Merge tag 'v4.9.148' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.148 stable release
2020-04-07 14:42:47 -03:00
Mauro (mdrjr) Ribeiro
d58d3f14f0 Merge tag 'v4.9.135' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.135 stable release
2020-04-07 13:40:04 -03:00
Mauro (mdrjr) Ribeiro
bb28e4bf23 Merge tag 'v4.9.132' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.132 stable release
2020-04-07 13:29:55 -03:00
Mauro (mdrjr) Ribeiro
77b94557ac Merge tag 'v4.9.120' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroidg12-4.9.y
This is the 4.9.120 stable release
2020-04-07 11:11:08 -03:00
Borislav Petkov
45d90b93ce proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()
[ Upstream commit cf089611f4 ]

Lianbo reported a build error with a particular 32-bit config, see Link
below for details.

Provide a weak copy_oldmem_page_encrypted() function which architectures
can override, in the same manner other functionality in that file is
supplied.

Reported-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: x86@kernel.org
Link: http://lkml.kernel.org/r/710b9d95-2f70-eadf-c4a1-c3dc80ee4ebb@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-25 09:53:44 +01:00
Tao Zeng
89fd94cefb vmap: fix wrong mmu setting in check sp funciton [1/1]
PD#TV-9668

Problem:
If sp address is in linear mapping range, check_sp_fault_again
function in vmap fault handler will still map a new page for it.
This will cause some data in R/W section polluted.

Solution:
Avoid map page if sp is in linear range.

Verify:
TL1 x301

Change-Id: I0e02a2048b586854c528cd3eeafb725751b9dc82
Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>
2019-09-19 19:46:28 -07:00
Andrea Arcangeli
16903f1a5b coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41 upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[akaher@vmware.com: stable 4.9 backport
-  handle binder_update_page_range - mhocko@suse.com]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06 18:29:41 +02:00
Radoslaw Burny
e83234d7ef fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
commit 5ec27ec735 upstream.

Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc.  Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes.  In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace.  I have confirmed this
with Eric Biederman at LPC and in this thread:
  https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com

Consequences
============

Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference.  However, it
causes an issue in a setup where:

 * a namespace container is created without root user in container -
   hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID

 * container creator tries to configure it by writing /proc/sys files,
   e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit

Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.

Using a container with no root mapping is apparently rare, but we do use
this configuration at Google.  Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.

History
=======

The invalid uids/gids in inodes first appeared due to 8175435777 (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues.  The
inability to write to these "invalid" inodes was only caused by a later
commit 0bd23d09b8 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).

Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.

Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
Fixes: 0bd23d09b8 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:33:28 +02:00
John Ogness
730855b0f4 fs/proc/array.c: allow reporting eip/esp for all coredumping threads
commit cb8f381f16 upstream.

0a1eb2d474 ("fs/proc: Stop reporting eip and esp in /proc/PID/stat")
stopped reporting eip/esp and fd7d56270b ("fs/proc: Report eip/esp in
/prod/PID/stat for coredumping") reintroduced the feature to fix a
regression with userspace core dump handlers (such as minicoredumper).

Because PF_DUMPCORE is only set for the primary thread, this didn't fix
the original problem for secondary threads.  Allow reporting the eip/esp
for all threads by checking for PF_EXITING as well.  This is set for all
the other threads when they are killed.  coredump_wait() waits for all the
tasks to become inactive before proceeding to invoke a core dumper.

Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de
Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de
Fixes: fd7d56270b ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reported-by: Jan Luebbe <jlu@pengutronix.de>
Tested-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:55:38 +02:00
YueHaibing
9f3a14bebe fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
commit 89189557b4 upstream.

Syzkaller report this:

  sysctl could not get directory: /net//bridge -12
  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
  RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
  RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
  RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
  Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
  RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
  RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
  RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
  R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   erase_entry fs/proc/proc_sysctl.c:178 [inline]
   erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
   start_unregistering fs/proc/proc_sysctl.c:331 [inline]
   drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
   get_subdir fs/proc/proc_sysctl.c:1022 [inline]
   __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
   br_netfilter_init+0x68/0x1000 [br_netfilter]
   do_one_initcall+0xbc/0x47d init/main.c:901
   do_init_module+0x1b5/0x547 kernel/module.c:3456
   load_module+0x6405/0x8c10 kernel/module.c:3804
   __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
   do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
   iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
  Dumping ftrace buffer:
     (ftrace buffer empty)
  ---[ end trace 68741688d5fbfe85 ]---

commit 23da958803 ("fs/proc/proc_sysctl.c: fix NULL pointer
dereference in put_links") forgot to handle start_unregistering() case,
while header->parent is NULL, it calls erase_header() and as seen in the
above syzkaller call trace, accessing &header->parent->root will trigger
a NULL pointer dereference.

As that commit explained, there is also no need to call
start_unregistering() if header->parent is NULL.

Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
Fixes: 23da958803 ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
Fixes: 0e47c99d7f ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02 09:32:04 +02:00
YueHaibing
28f0641fba fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
commit 23da958803 upstream.

Syzkaller reports:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
FS:  00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
 get_subdir fs/proc/proc_sysctl.c:1022 [inline]
 __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
 br_netfilter_init+0xbc/0x1000 [br_netfilter]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 770020de38961fd0 ]---

A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL.  Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer.  Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.

In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.

Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
Fixes: 0e47c99d7f ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>    [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03 06:24:19 +02:00
Michal Hocko
2d182ba434 proc, oom: do not report alien mms when setting oom_score_adj
commit b2b469939e upstream.

Tetsuo has reported that creating a thousands of processes sharing MM
without SIGHAND (aka alien threads) and setting
/proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1]
to finish.  This is especially worrisome that all that printing is done
under RCU lock and this can potentially trigger RCU stall or softlockup
detector.

The primary reason for the printk was to catch potential users who might
depend on the behavior prior to 44a70adec9 ("mm, oom_adj: make sure
processes sharing mm have same view of oom_score_adj") but after more
than 2 years without a single report I guess it is safe to simply remove
the printk altogether.

The next step should be moving oom_score_adj over to the mm struct and
remove all the tasks crawling as suggested by [2]

[1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp
[2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:06:58 +01:00
Guenter Roeck
f311b6cd35 proc: Remove empty line in /proc/self/status
If CONFIG_SECCOMP=n, /proc/self/status includes an empty line. This causes
the iotop application to bail out with an error message.

File "/usr/local/lib64/python2.7/site-packages/iotop/data.py", line 196,
	in parse_proc_pid_status
key, value = line.split(':\t', 1)
ValueError: need more than 1 value to unpack

The problem is seen in v4.9.y but not upstream because commit af884cd4a5
("proc: report no_new_privs state") has not been backported to v4.9.y.
The backport of commit fae1fa0fc6 ("proc: Provide details on speculation
flaw mitigations") tried to address the resulting differences but was
wrong, introducing the problem.

Fixes: 51ef9af2a3 ("proc: Provide details on speculation flaw mitigations")
Cc: Kees Cook <keescook@chromium.org>
Cc: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Kees Cook <keescook@chromium.org>
2019-01-23 08:10:53 +01:00
Ivan Delalande
4d5741aa2a proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
commit ea5751ccd6 upstream.

proc_sys_lookup can fail with ENOMEM instead of ENOENT when the
corresponding sysctl table is being unregistered. In our case we see
this upon opening /proc/sys/net/*/conf files while network interfaces
are being deleted, which confuses our configuration daemon.

The problem was successfully reproduced and this fix tested on v4.9.122
and v4.20-rc6.

v2: return ERR_PTRs in all cases when proc_sys_make_inode fails instead
of mixing them with NULL. Thanks Al Viro for the feedback.

Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:40:16 +01:00
tao zeng
0d3f347e10 mm: optimize thread stack usage on arm64 [1/1]
PD#SWPL-1219

Problem:
On arm64, thread stack is 16KB for each task. If running task number
is large, this type of memory may over 40MB. It's a large amount on
small memory platform. But most case thread only use less 4KB stack.
It's waste of memory and we need optimize it.

Solution:
1. Pre-allocate a vmalloc address space for task stack;
2. Only map 1st page for stack and handle page fault in EL1
   when stack growth triggered exception;
3. handle stack switch for exception.

Verify:
p212

Change-Id: I47f511ccfa2868d982bc10a820ed6435b6d52ba9
Signed-off-by: tao zeng <tao.zeng@amlogic.com>
2018-12-06 02:57:50 -08:00
Frederic Weisbecker
dbf9a0532e sched/cputime: Convert kcpustat to nsecs
commit 7fb1327ee9 upstream.

Kernel CPU stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code and is one step toward
the removal of cputime_t in the core code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[colona: minor conflict as 527b0a76f4 ("sched/cpuacct: Avoid %lld seq_printf
 warning") is missing from v4.9]
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20 09:51:32 +02:00
Jann Horn
3c5dc3f313 proc: restrict kernel stack dumps to root
commit f8a00cef17 upstream.

Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU.  That means that
the stack can change under the stack walker.  The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers.  This can cause exposure of
kernel stack contents to userspace.

Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents.  See the added comment for a longer
rationale.

There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails.  Therefore, I believe
that this change is unlikely to break things.  In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.

Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10 08:53:23 +02:00
Eric W. Biederman
a3a7b992b2 proc: Fix proc_sys_prune_dcache to hold a sb reference
commit 2fd1d2c4ce upstream.

Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
> BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
> CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
> Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
> Workqueue: events proc_cleanup_work
> Call Trace:
>  dump_stack+0x63/0x86
>  __warn+0xcb/0xf0
>  warn_slowpath_null+0x1d/0x20
>  umount_check+0x6e/0x80
>  d_walk+0xc6/0x270
>  ? dentry_free+0x80/0x80
>  do_one_tree+0x26/0x40
>  shrink_dcache_for_umount+0x2d/0x90
>  generic_shutdown_super+0x1f/0xf0
>  kill_anon_super+0x12/0x20
>  proc_kill_sb+0x40/0x50
>  deactivate_locked_super+0x43/0x70
>  deactivate_super+0x5a/0x60
>  cleanup_mnt+0x3f/0x90
>  mntput_no_expire+0x13b/0x190
>  kern_unmount+0x3e/0x50
>  pid_ns_release_proc+0x15/0x20
>  proc_cleanup_work+0x15/0x20
>  process_one_work+0x197/0x450
>  worker_thread+0x4e/0x4a0
>  kthread+0x109/0x140
>  ? process_one_work+0x450/0x450
>  ? kthread_park+0x90/0x90
>  ret_from_fork+0x2c/0x40
> ---[ end trace e1c109611e5d0b41 ]---
> VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: _raw_spin_lock+0xc/0x30
> PGD 0

Fix this by taking a reference to the super block in proc_sys_prune_dcache.

The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used.  This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list.  Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks.  The fact that
head->unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.

Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to.  The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.

Cc: stable@vger.kernel.org
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@openvz.org>
Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 18:14:43 +02:00
Eric W. Biederman
631f93a6fe proc/sysctl: Don't grab i_lock under sysctl_lock.
commit ace0c791e6 upstream.

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G             L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659]  (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882]  (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.

Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine.  So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after.  Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there.  Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.

I agree with Al Viro's analsysis of the situtation.

Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Tested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 18:14:43 +02:00
Konstantin Khlebnikov
b96e215e53 proc/sysctl: prune stale dentries during unregistering
commit d6cffbbe9a upstream.

Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.

For example, command:

 # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
 creates a millions of stale denties around sysctls of loopback interface:

 # sysctl fs.dentry-state
 fs.dentry-state = 25812579  24724135        45      0       0       0

 All of them have matching names thus lookup have to scan though whole
 hash chain and call d_compare (proc_sys_compare) which checks them
 under system-wide spinlock (sysctl_lock).

 # time sysctl -a > /dev/null
 real    1m12.806s
 user    0m0.016s
 sys     1m12.400s

Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.

This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 10.02.2017 10:47, Al Viro wrote:
>> how about >> the matching stats *after* that patch?
>
> dcache size doesn't grow endlessly, so stats are fine
>
> # sysctl fs.dentry-state
> fs.dentry-state = 92712	58376	45	0	0	0
>
> # time sysctl -a &>/dev/null
>
> real	0m0.013s
> user	0m0.004s
> sys	0m0.008s

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 18:14:43 +02:00
Greg Kroah-Hartman
9797dcb8c7 Merge 4.9.104 into android-4.9
Changes in 4.9.104
	MIPS: c-r4k: Fix data corruption related to cache coherence
	MIPS: ptrace: Expose FIR register through FP regset
	MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
	KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
	affs_lookup(): close a race with affs_remove_link()
	aio: fix io_destroy(2) vs. lookup_ioctx() race
	ALSA: timer: Fix pause event notification
	do d_instantiate/unlock_new_inode combinations safely
	mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
	mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
	libata: Blacklist some Sandisk SSDs for NCQ
	libata: blacklist Micron 500IT SSD with MU01 firmware
	xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
	drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
	IB/hfi1: Use after free race condition in send context error path
	Revert "ipc/shm: Fix shmat mmap nil-page protection"
	ipc/shm: fix shmat() nil address after round-down when remapping
	kasan: fix memory hotplug during boot
	kernel/sys.c: fix potential Spectre v1 issue
	kernel/signal.c: avoid undefined behaviour in kill_something_info
	KVM/VMX: Expose SSBD properly to guests
	KVM: s390: vsie: fix < 8k check for the itdba
	KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
	kvm: x86: IA32_ARCH_CAPABILITIES is always supported
	firewire-ohci: work around oversized DMA reads on JMicron controllers
	x86/tsc: Allow TSC calibration without PIT
	NFSv4: always set NFS_LOCK_LOST when a lock is lost.
	ALSA: hda - Use IS_REACHABLE() for dependency on input
	kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
	netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
	tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
	PCI: Add function 1 DMA alias quirk for Marvell 9128
	Input: psmouse - fix Synaptics detection when protocol is disabled
	i40iw: Zero-out consumer key on allocate stag for FMR
	tools lib traceevent: Simplify pointer print logic and fix %pF
	perf callchain: Fix attr.sample_max_stack setting
	tools lib traceevent: Fix get_field_str() for dynamic strings
	perf record: Fix failed memory allocation for get_cpuid_str
	iommu/vt-d: Use domain instead of cache fetching
	dm thin: fix documentation relative to low water mark threshold
	net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b
	net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
	nfs: Do not convert nfs_idmap_cache_timeout to jiffies
	watchdog: sp5100_tco: Fix watchdog disable bit
	kconfig: Don't leak main menus during parsing
	kconfig: Fix automatic menu creation mem leak
	kconfig: Fix expr_free() E_NOT leak
	mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
	ipmi/powernv: Fix error return code in ipmi_powernv_probe()
	Btrfs: set plug for fsync
	btrfs: Fix out of bounds access in btrfs_search_slot
	Btrfs: fix scrub to repair raid6 corruption
	btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP
	HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
	fm10k: fix "failed to kill vid" message for VF
	device property: Define type of PROPERTY_ENRTY_*() macros
	jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
	powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
	powerpc/numa: Ensure nodes initialized for hotplug
	RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
	ntb_transport: Fix bug with max_mw_size parameter
	gianfar: prevent integer wrapping in the rx handler
	tcp_nv: fix potential integer overflow in tcpnv_acked
	kvm: Map PFN-type memory regions as writable (if possible)
	ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
	ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
	ocfs2: return error when we attempt to access a dirty bh in jbd2
	mm/mempolicy: fix the check of nodemask from user
	mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
	asm-generic: provide generic_pmdp_establish()
	sparc64: update pmdp_invalidate() to return old pmd value
	mm: thp: use down_read_trylock() in khugepaged to avoid long block
	mm: pin address_space before dereferencing it while isolating an LRU page
	mm/fadvise: discard partial page if endbyte is also EOF
	openvswitch: Remove padding from packet before L3+ conntrack processing
	IB/ipoib: Fix for potential no-carrier state
	drm/nouveau/pmu/fuc: don't use movw directly anymore
	netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure
	x86/power: Fix swsusp_arch_resume prototype
	firmware: dmi_scan: Fix handling of empty DMI strings
	ACPI: processor_perflib: Do not send _PPC change notification if not ready
	ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
	bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
	MIPS: generic: Fix machine compatible matching
	MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
	xen-netfront: Fix race between device setup and open
	xen/grant-table: Use put_page instead of free_page
	RDS: IB: Fix null pointer issue
	arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
	proc: fix /proc/*/map_files lookup
	cifs: silence compiler warnings showing up with gcc-8.0.0
	bcache: properly set task state in bch_writeback_thread()
	bcache: fix for allocator and register thread race
	bcache: fix for data collapse after re-attaching an attached device
	bcache: return attach error when no cache set exist
	tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
	bpf: fix rlimit in reuseport net selftest
	vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
	locking/qspinlock: Ensure node->count is updated before initialising node
	irqchip/gic-v3: Ignore disabled ITS nodes
	cpumask: Make for_each_cpu_wrap() available on UP as well
	irqchip/gic-v3: Change pr_debug message to pr_devel
	ARC: Fix malformed ARC_EMUL_UNALIGNED default
	ptr_ring: prevent integer overflow when calculating size
	libata: Fix compile warning with ATA_DEBUG enabled
	selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
	selftests: memfd: add config fragment for fuse
	ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
	ARM: OMAP3: Fix prm wake interrupt for resume
	ARM: OMAP1: clock: Fix debugfs_create_*() usage
	ibmvnic: Free RX socket buffer in case of adapter error
	iwlwifi: mvm: fix security bug in PN checking
	iwlwifi: mvm: always init rs with 20mhz bandwidth rates
	NFC: llcp: Limit size of SDP URI
	rxrpc: Work around usercopy check
	mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
	mac80211: fix a possible leak of station stats
	mac80211: fix calling sleeping function in atomic context
	mac80211: Do not disconnect on invalid operating class
	md raid10: fix NULL deference in handle_write_completed()
	drm/exynos: g2d: use monotonic timestamps
	drm/exynos: fix comparison to bitshift when dealing with a mask
	locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
	md: raid5: avoid string overflow warning
	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
	powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
	s390/cio: fix ccw_device_start_timeout API
	s390/cio: fix return code after missing interrupt
	s390/cio: clear timer when terminating driver I/O
	PKCS#7: fix direct verification of SignerInfo signature
	ARM: OMAP: Fix dmtimer init for omap1
	smsc75xx: fix smsc75xx_set_features()
	regulatory: add NUL to request alpha2
	integrity/security: fix digsig.c build error with header file
	locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
	x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
	mac80211: drop frames with unexpected DS bits from fast-rx to slow path
	arm64: fix unwind_frame() for filtered out fn for function graph tracing
	macvlan: fix use-after-free in macvlan_common_newlink()
	kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
	fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
	fs: dcache: Use READ_ONCE when accessing i_dir_seq
	md: fix a potential deadlock of raid5/raid10 reshape
	md/raid1: fix NULL pointer dereference
	batman-adv: fix packet checksum in receive path
	batman-adv: invalidate checksum on fragment reassembly
	netfilter: ebtables: convert BUG_ONs to WARN_ONs
	batman-adv: Ignore invalid batadv_iv_gw during netlink send
	batman-adv: Ignore invalid batadv_v_gw during netlink send
	batman-adv: Fix netlink dumping of BLA claims
	batman-adv: Fix netlink dumping of BLA backbones
	nvme-pci: Fix nvme queue cleanup if IRQ setup fails
	clocksource/drivers/fsl_ftm_timer: Fix error return checking
	ceph: fix dentry leak when failing to init debugfs
	ARM: orion5x: Revert commit 4904dbda41.
	qrtr: add MODULE_ALIAS macro to smd
	r8152: fix tx packets accounting
	virtio-gpu: fix ioctl and expose the fixed status to userspace.
	dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
	bcache: fix kcrashes with fio in RAID5 backend dev
	ip6_tunnel: fix IFLA_MTU ignored on NEWLINK
	sit: fix IFLA_MTU ignored on NEWLINK
	ARM: dts: NSP: Fix amount of RAM on BCM958625HR
	powerpc/boot: Fix random libfdt related build errors
	gianfar: Fix Rx byte accounting for ndev stats
	net/tcp/illinois: replace broken algorithm reference link
	nvmet: fix PSDT field check in command format
	xen/pirq: fix error path cleanup when binding MSIs
	drm/sun4i: Fix dclk_set_phase
	Btrfs: send, fix issuing write op when processing hole in no data mode
	selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
	iwlwifi: mvm: fix TX of CCMP 256
	watchdog: f71808e_wdt: Fix magic close handling
	watchdog: sbsa: use 32-bit read for WCV
	batman-adv: Fix multicast packet loss with a single WANT_ALL_IPV4/6 flag
	e1000e: Fix check_for_link return value with autoneg off
	e1000e: allocate ring descriptors with dma_zalloc_coherent
	ia64/err-inject: Use get_user_pages_fast()
	RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
	RDMA/qedr: Fix iWARP write and send with immediate
	IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
	IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
	IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
	fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
	fsl/fman: avoid sleeping in atomic context while adding an address
	net: qcom/emac: Use proper free methods during TX
	net: smsc911x: Fix unload crash when link is up
	IB/core: Fix possible crash to access NULL netdev
	xen: xenbus: use put_device() instead of kfree()
	arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
	dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
	netfilter: ebtables: fix erroneous reject of last rule
	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
	workqueue: use put_device() instead of kfree()
	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
	sunvnet: does not support GSO for sctp
	drm/imx: move arming of the vblank event to atomic_flush
	net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
	batman-adv: fix header size check in batadv_dbg_arp()
	batman-adv: Fix skbuff rcsum on packet reroute
	vti4: Don't count header length twice on tunnel setup
	vti4: Don't override MTU passed on link creation via IFLA_MTU
	perf/cgroup: Fix child event counting bug
	brcmfmac: Fix check for ISO3166 code
	kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
	RDMA/ucma: Correct option size check using optlen
	RDMA/qedr: fix QP's ack timeout configuration
	RDMA/qedr: Fix rc initialization on CNQ allocation failure
	mm/mempolicy.c: avoid use uninitialized preferred_node
	mm, thp: do not cause memcg oom for thp
	selftests: ftrace: Add probe event argument syntax testcase
	selftests: ftrace: Add a testcase for string type with kprobe_event
	selftests: ftrace: Add a testcase for probepoint
	batman-adv: fix multicast-via-unicast transmission with AP isolation
	batman-adv: fix packet loss for broadcasted DHCP packets to a server
	ARM: 8748/1: mm: Define vdso_start, vdso_end as array
	net: qmi_wwan: add BroadMobi BM806U 2020:2033
	perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
	llc: properly handle dev_queue_xmit() return value
	builddeb: Fix header package regarding dtc source links
	mm/kmemleak.c: wait for scan completion before disabling free
	net: Fix untag for vlan packets without ethernet header
	net: mvneta: fix enable of all initialized RXQs
	sh: fix debug trap failure to process signals before return to user
	nvme: don't send keep-alives to the discovery controller
	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
	x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
	fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
	swap: divide-by-zero when zero length swap file on ssd
	sr: get/drop reference to device in revalidate and check_events
	Force log to disk before reading the AGF during a fstrim
	cpufreq: CPPC: Initialize shared perf capabilities of CPUs
	dp83640: Ensure against premature access to PHY registers after reset
	mm/ksm: fix interaction with THP
	mm: fix races between address_space dereference and free in page_evicatable
	Btrfs: bail out on error during replay_dir_deletes
	Btrfs: fix NULL pointer dereference in log_dir_items
	btrfs: Fix possible softlock on single core machines
	ocfs2/dlm: don't handle migrate lockres if already in shutdown
	sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
	KVM: VMX: raise internal error for exception during invalid protected mode state
	fscache: Fix hanging wait on page discarded by writeback
	sparc64: Make atomic_xchg() an inline function rather than a macro.
	net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
	btrfs: tests/qgroup: Fix wrong tree backref level
	Btrfs: fix copy_items() return value when logging an inode
	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
	rxrpc: Fix Tx ring annotation after initial Tx failure
	rxrpc: Don't treat call aborts as conn aborts
	xen/acpi: off by one in read_acpi_id()
	drivers: macintosh: rack-meter: really fix bogus memsets
	ACPI: acpi_pad: Fix memory leak in power saving threads
	powerpc/mpic: Check if cpu_possible() in mpic_physmask()
	m68k: set dma and coherent masks for platform FEC ethernets
	parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
	hwmon: (nct6775) Fix writing pwmX_mode
	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
	powerpc/perf: Fix kernel address leak via sampling registers
	tools/thermal: tmon: fix for segfault
	selftests: Print the test we're running to /dev/kmsg
	net/mlx5: Protect from command bit overflow
	ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
	cxgb4: Setup FW queues before registering netdev
	ima: Fallback to the builtin hash algorithm
	virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
	arm: dts: socfpga: fix GIC PPI warning
	cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
	zorro: Set up z->dev.dma_mask for the DMA API
	bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
	ACPICA: Events: add a return on failure from acpi_hw_register_read
	ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
	cxgb4: Fix queue free path of ULD drivers
	i2c: mv64xxx: Apply errata delay only in standard mode
	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
	perf top: Fix top.call-graph config option reading
	perf stat: Fix core dump when flag T is used
	IB/core: Honor port_num while resolving GID for IB link layer
	regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
	spi: bcm-qspi: fIX some error handling paths
	MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
	PCI: Restore config space on runtime resume despite being unbound
	ipmi_ssif: Fix kernel panic at msg_done_handler
	powerpc: Add missing prototype for arch_irq_work_raise()
	f2fs: fix to check extent cache in f2fs_drop_extent_tree
	perf/core: Fix perf_output_read_group()
	drm/panel: simple: Fix the bus format for the Ontat panel
	hwmon: (pmbus/max8688) Accept negative page register values
	hwmon: (pmbus/adm1275) Accept negative page register values
	perf/x86/intel: Properly save/restore the PMU state in the NMI handler
	cdrom: do not call check_disk_change() inside cdrom_open()
	perf/x86/intel: Fix large period handling on Broadwell CPUs
	perf/x86/intel: Fix event update for auto-reload
	arm64: dts: qcom: Fix SPI5 config on MSM8996
	soc: qcom: wcnss_ctrl: Fix increment in NV upload
	gfs2: Fix fallocate chunk size
	x86/devicetree: Initialize device tree before using it
	x86/devicetree: Fix device IRQ settings in DT
	ALSA: vmaster: Propagate slave error
	dmaengine: pl330: fix a race condition in case of threaded irqs
	dmaengine: rcar-dmac: Check the done lists in rcar_dmac_chan_get_residue()
	enic: enable rq before updating rq descriptors
	hwrng: stm32 - add reset during probe
	dmaengine: qcom: bam_dma: get num-channels and num-ees from dt
	net: stmmac: ensure that the device has released ownership before reading data
	net: stmmac: ensure that the MSS desc is the last desc to set the own bit
	cpufreq: Reorder cpufreq_online() error code path
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
	udf: Provide saner default for invalid uid / gid
	ARM: dts: bcm283x: Fix probing of bcm2835-i2s
	audit: return on memory error to avoid null pointer dereference
	rcu: Call touch_nmi_watchdog() while printing stall warnings
	pinctrl: sh-pfc: r8a7796: Fix MOD_SEL register pin assignment for SSI pins group
	MIPS: Octeon: Fix logging messages with spurious periods after newlines
	drm/rockchip: Respect page offset for PRIME mmap calls
	x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
	perf tests: Use arch__compare_symbol_names to compare symbols
	perf report: Fix memory corruption in --branch-history mode --branch-history
	selftests/net: fixes psock_fanout eBPF test case
	netlabel: If PF_INET6, check sk_buff ip header version
	regmap: Correct comparison in regmap_cached
	ARM: dts: imx7d: cl-som-imx7: fix pinctrl_enet
	ARM: dts: porter: Fix HDMI output routing
	regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
	pinctrl: msm: Use dynamic GPIO numbering
	kdb: make "mdr" command repeat
	Linux 4.9.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-30 13:19:56 +02:00
Danilo Krummrich
b621438301 fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
[ Upstream commit a0b0d1c345 ]

proc_sys_link_fill_cache() does not take currently unregistering sysctl
tables into account, which might result into a page fault in
sysctl_follow_link() - add a check to fix it.

This bug has been present since v3.4.

Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
Fixes: 0e47c99d7f ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:40 +02:00
Jia Zhang
059befd4e0 vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
[ Upstream commit 595dd46ebf ]

Commit:

  df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext data")

... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
However, accessing the vsyscall user page will cause an SMAP fault.

Replace memcpy() with copy_from_user() to fix this bug works, but adding
a common way to handle this sort of user page may be useful for future.

Currently, only vsyscall page requires KCORE_USER.

Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:26 +02:00
Alexey Dobriyan
e0a1a0173a proc: fix /proc/*/map_files lookup
[ Upstream commit ac7f1061c2 ]

Current code does:

	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)

However sscanf() is broken garbage.

It silently accepts whitespace between format specifiers
(did you know that?).

It silently accepts valid strings which result in integer overflow.

Do not use sscanf() for any even remotely reliable parsing code.

	OK
	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
	/lib/systemd/systemd

	broken
	# readlink '/proc/1/map_files/               55a23af39000-55a23b05b000'
	/lib/systemd/systemd

	broken
	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000    '
	/lib/systemd/systemd

	very broken
	# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
	/lib/systemd/systemd

Andrei said:

: This patch breaks criu.  It was a bug in criu.  And this bug is on a minor
: path, which works when memfd_create() isn't available.  It is a reason why
: I ask to not backport this patch to stable kernels.
:
: In CRIU this bug can be triggered, only if this patch will be backported
: to a kernel which version is lower than v3.16.

Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:25 +02:00
Connor O'Brien
2e35bed46b ANDROID: proc: fix undefined behavior in proc_uid_base_readdir
When uid_base_stuff has no entries, proc_uid_base_readdir tries to
compute an address before the start of the array. Revise this check to
use uid_base_stuff + nents instead, which makes the code valid
regardless of array size.

Bug: 80158484
Test: No more compiler warning with CONFIG_CPU_FREQ_TIMES=n
Change-Id: I6e55b27c3ba8210cee194f6d27bbd62c0b263796
Signed-off-by: Connor O'Brien <connoro@google.com>
2018-05-24 01:28:28 +00:00