This adds debugfs consumer for xHCI driver. The debugfs entries
read all host registers, device/endpoint contexts, command ring,
event ring and various endpoint rings. With these entries, users
can check the registers and memory spaces used by a host during
run time, or save all the information with a simple 'cp -r' for
post-mortem programs.
The file hierarchy looks like this.
[root of debugfs]
|__usb
|____[e,u,o]hci <---------[root for other HCIs]
|____xhci <---------------[root for xHCI]
|______0000:00:14.0 <--------------[xHCI host name]
|________reg-cap <--------[capability registers]
|________reg-op <-------[operational registers]
|________reg-runtime <-----------[runtime registers]
|________reg-ext-#cap_name <----[extended capability regs]
|________command-ring <-------[root for command ring]
|__________cycle <------------------[ring cycle]
|__________dequeue <--------[ring dequeue pointer]
|__________enqueue <--------[ring enqueue pointer]
|__________trbs <-------------------[ring trbs]
|________event-ring <---------[root for event ring]
|__________cycle <------------------[ring cycle]
|__________dequeue <--------[ring dequeue pointer]
|__________enqueue <--------[ring enqueue pointer]
|__________trbs <-------------------[ring trbs]
|________devices <------------[root for devices]
|__________#slot_id <-----------[root for a device]
|____________name <-----------------[device name]
|____________slot-context <----------------[slot context]
|____________ep-context <-----------[endpoint contexts]
|____________ep#ep_index <--------[root for an endpoint]
|______________cycle <------------------[ring cycle]
|______________dequeue <--------[ring dequeue pointer]
|______________enqueue <--------[ring enqueue pointer]
|______________trbs <-------------------[ring trbs]
Change-Id: I61b412883e547a257bffc1234ee42fd221b940a6
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 02b6fdc2a1)
Add PORTSC Port status and control register decoder to
show human readable tracing of portsc register
Change-Id: I617493712e58546197748c01eea3f6b33b6dd377
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 2e77a8253d)
Add definitions for all port link states defined in xhci
specification for PORTSC register.
Will be needed for human readable port status tracing
Change-Id: I1de382f7e45c0a6a5481b92eabb2cf3b3fc68419
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 7344ee328c)
temp and temp1 variables are used for port status (portsc) and
command register. Give them more descriptive names
No functional changes
Change-Id: Ic278965bbf21910b36404c90da955c74df69fc95
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 76a0f32b28)
xhci_decode_trb() treats a link trb in the same way as that for
an event trb. This patch fixes this by decoding the link trb
according to the spec.
Change-Id: Ia582395d828b0823c6279c45247fdcec89a19680
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 96d9a6eb97)
XHCI context changes have already been traced by the trace
events. It's unnecessary to put the same message in kernel
log. This patch removes the use of xhci_dbg_ctx().
Change-Id: I0279d03a772ee06965b146b42937ca74f74ffb75
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit c8844f2ddb)
Every XHCI TRB has already been traced by the trb trace events.
It is unnecessary to put the same message in kernel log. This
patch removes xhci_debug_trb().
Change-Id: I0e5c452f2c133fb4cb8ec62bd5a9c3ef4163a1f2
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 8c10152ec5)
XHCI ring changes have already been traced by the ring trace
events. It's unnecessary to put the same messages in kernel
log. This patch removes the debugging code for a ring.
Change-Id: Ie15a464c9e82e61adf2e18167c145e78fb1d1fe8
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 121dcf1190)
This patch creates a new event class called xhci_log_ring, and
defines the events used for tracing the change of all kinds of
rings used by an xhci host. An xHCI ring is basically a memory
block shared between software and hardware. By tracing changes
of rings, it makes the life easier for debugging hardware or
software problems.
This info can be used, later, to print, in a human readable way,
the life cycle of an xHCI ring using the trace-cmd tool and the
appropriate plugin.
Change-Id: Ib5934eed68e044054305b1f15925751a870b93a8
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit b2d6edbb95)
There's one annoyance in how xhci prints debug messages, we often
get logs with messages but it's hard to say from which device and
endpoint the message originates. Add slot_id, ep_index messages
in handle_tx_event.
Change-Id: I0d4fd42e95d8f85a5ba7de0a7ad45d4a2858a78a
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit b7f769ae1b)
With these, we can track what's happening with the HW while executing
each and every command. It will give us visibility into how the
different contexts are being modified by xHC which can bring insight
into problems while debugging.
Change-Id: I0d38de2b751416414fcd0feb85c163e0b24c3014
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 19a7d0d65c)
By extracting and exposing xhci_slot_state_string() in a header file, we
can re-use it to print Slot Context State from our tracepoints, which
can aid in tracking down problems related to command execution.
Change-Id: I65d812e5d1137a56da74bde0b846f9ad03ff2234
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 52407729fb)
Let's start tracing at least part of an xhci_virt_device lifetime. We
might want to extend this tracepoint class later, but for now it already
exposes quite a bit of valuable information.
Change-Id: I9cf4b89292d5062c64bc96045692691f90834fe2
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit a711edeeb1)
If we add that newline, the output will look like the following:
kworker/2:1-42 [002] .... 169.811435: xhci_address_ctx:
ctx_64=0, ctx_type=2, ctx_dma=@153fbd000, ctx_va=@ffff880153fbd000
We would rather have that in a single line.
Change-Id: Ic01adda02391f919de3dcb8e9835f16acf41d7d0
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit d4d93e6c55)
instead of having a tracer that can only trace command completions,
let's promote this tracer so it can trace and decode any TRB.
With that, it will be easier to extrapolate the lifetime of any TRB
which might help debugging certain issues.
Change-Id: I9cbb163b1198893c1028f3c10a694efb3748f86a
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit a37c3f76e6)
If we just provide a helper to convert completion code to string, we can
combine all debugging messages into a single print.
[keep the old debug messages, for warn and grep -Mathias]
Change-Id: Ie147da78006f5d244bd5f65a8011bf38e748bd34
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit ed6d643b14)
Cleanup only. This patch is a mechaninal rename to make sure our macros
for TRB completion codes match what the specification uses to refer to
such errors. The idea behind this is that it makes it far easier to grep
the specification and match it with implementation.
Change-Id: I3f04cc262f5ecc8a6b36463070243511b1e14129
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit 0b7c105a04)
Split DT source files to separate out android firmware for Android Pie & Oreo
Change-Id: I202f2350a3d8dcf8ef6762dd39784472a4ce7281
Signed-off-by: Bian Jin chen <kenjc.bian@rock-chips.com>
To avoid PRSTN being drived when PCIe is working,switch to
PCIe_PRSTNm0 as a workaround.
Change-Id: I094be7a873d0bff301792edea6929e1199cc52a2
Signed-off-by: Simon Xue <xxm@rock-chips.com>
Add support for DW PCIe controller found on RK1808 SoC platform
Change-Id: Ic6d638782d1f55f965d663f73eee14bafa392740
Signed-off-by: Simon Xue <xxm@rock-chips.com>
commit 7f3ef5dedb upstream.
Leaving the DRM driver enabled on reboot or kexec has the annoying
effect of leaving the display generating transactions whilst the
IOMMU has been shut down.
In turn, the IOMMU driver (which shares its interrupt line with
the VOP) starts warning either on shutdown or when entering the
secondary kernel in the kexec case (nothing is expected on that
front).
A cheap way of ensuring that things are nicely shut down is to
register a shutdown callback in the platform driver.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20180805124807.18169-1-marc.zyngier@arm.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 017b1660df upstream.
The page migration code employs try_to_unmap() to try and unmap the source
page. This is accomplished by using rmap_walk to find all vmas where the
page is mapped. This search stops when page mapcount is zero. For shared
PMD huge pages, the page map count is always 1 no matter the number of
mappings. Shared mappings are tracked via the reference count of the PMD
page. Therefore, try_to_unmap stops prematurely and does not completely
unmap all mappings of the source page.
This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the target
page. Hence, data is lost.
This problem was originally seen as DB corruption of shared global areas
after a huge page was soft offlined due to ECC memory errors. DB
developers noticed they could reproduce the issue by (hotplug) offlining
memory used to back huge pages. A simple testcase can reproduce the
problem by creating a shared PMD mapping (note that this must be at least
PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
migrate_pages() to migrate process pages between nodes while continually
writing to the huge pages being migrated.
To fix, have the try_to_unmap_one routine check for huge PMD sharing by
calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared
mapping it will be 'unshared' which removes the page table entry and drops
the reference on the PMD page. After this, flush caches and TLB.
mmu notifiers are called before locking page tables, but we can not be
sure of PMD sharing until page tables are locked. Therefore, check for
the possibility of PMD sharing before locking so that notifiers can
prepare for the worst possible case.
Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: make _range_in_vma() a static inline]
Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
Fixes: 39dde65c99 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e41540c8a upstream.
This bug has been experienced several times by the Oracle DB team. The
BUG is in remove_inode_hugepages() as follows:
/*
* If page is mapped, it was faulted in after being
* unmapped in caller. Unmap (again) now after taking
* the fault mutex. The mutex will prevent faults
* until we finish removing the page.
*
* This race can only happen in the hole punch case.
* Getting here in a truncate operation is a bug.
*/
if (unlikely(page_mapped(page))) {
BUG_ON(truncate_op);
In this case, the elevated map count is not the result of a race.
Rather it was incorrectly incremented as the result of a bug in the huge
pmd sharing code. Consider the following:
- Process A maps a hugetlbfs file of sufficient size and alignment
(PUD_SIZE) that a pmd page could be shared.
- Process B maps the same hugetlbfs file with the same size and
alignment such that a pmd page is shared.
- Process B then calls mprotect() to change protections for the mapping
with the shared pmd. As a result, the pmd is 'unshared'.
- Process B then calls mprotect() again to chage protections for the
mapping back to their original value. pmd remains unshared.
- Process B then forks and process C is created. During the fork
process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
tables. Copying page tables for hugetlb mappings is done in the
routine copy_hugetlb_page_range.
In copy_hugetlb_page_range(), the destination pte is obtained by:
dst_pte = huge_pte_alloc(dst, addr, sz);
If pmd sharing is possible, the returned pointer will be to a pte in an
existing page table. In the situation above, process C could share with
either process A or process B. Since process A is first in the list,
the returned pte is a pointer to a pte in process A's page table.
However, the check for pmd sharing in copy_hugetlb_page_range is:
/* If the pagetables are shared don't copy or take references */
if (dst_pte == src_pte)
continue;
Since process C is sharing with process A instead of process B, the
above test fails. The code in copy_hugetlb_page_range which follows
assumes dst_pte points to a huge_pte_none pte. It copies the pte entry
from src_pte to dst_pte and increments this map count of the associated
page. This is how we end up with an elevated map count.
To solve, check the dst_pte entry for huge_pte_none. If !none, this
implies PMD sharing so do not copy.
Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
Fixes: c5c99429fa ("fix hugepages leak due to pagetable page sharing")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1823342a1f upstream.
gcc 8.1.0 complains:
fs/configfs/symlink.c:67:3: warning:
'strncpy' output truncated before terminating nul copying as many
bytes from a string as its length
fs/configfs/symlink.c: In function 'configfs_get_link':
fs/configfs/symlink.c:63:13: note: length computed here
Using strncpy() is indeed less than perfect since the length of data to
be copied has already been determined with strlen(). Replace strncpy()
with memcpy() to address the warning and optimize the code a little.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu@cybertrust.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fabaf3034 upstream.
fuse_request_send_notify_reply() may fail if the connection was reset for
some reason (e.g. fs was unmounted). Don't leak request reference in this
case. Besides leaking memory, this resulted in fc->num_waiting not being
decremented and hence fuse_wait_aborted() left in a hanging and unkillable
state.
Fixes: 2d45ba381a ("fuse: add retrieve request")
Fixes: b8f95e5d13 ("fuse: umount should wait for all requests")
Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> #v2.6.36
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce9a992ff upstream.
Fix an issue with the 32-bit range error path in `rtc_hctosys' where no
error code is set and consequently the successful preceding call result
from `rtc_read_time' is propagated to `rtc_hctosys_ret'. This in turn
makes any subsequent call to `hctosys_show' incorrectly report in sysfs
that the system time has been set from this RTC while it has not.
Set the error to ERANGE then if we can't express the result due to an
overflow.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: b3a5ac42ab ("rtc: hctosys: Ensure system time doesn't overflow time_t")
Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d7a5bcb67 upstream.
When truncating the encode buffer, the page_ptr is getting
advanced, causing the next page to be skipped while encoding.
The page is still included in the response, so the response
contains a page of bogus data.
We need to adjust the page_ptr backwards to ensure we encode
the next page into the correct place.
We saw this triggered when concurrent directory modifications caused
nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
call to xdr_truncate_encode() corrupted the READDIR reply.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c8e0a1b68 upstream.
Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote:
> As per mount_namespaces(7) unprivileged users should not be able to look under mount points:
>
> Mounts that come as a single unit from more privileged mount are locked
> together and may not be separated in a less privileged mount namespace.
>
> However they can:
>
> 1. Create a mount namespace.
> 2. In the mount namespace open a file descriptor to the parent of a mount point.
> 3. Destroy the mount namespace.
> 4. Use the file descriptor to look under the mount point.
>
> I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8.
>
> The setup:
>
> $ sudo sysctl kernel.unprivileged_userns_clone=1
> kernel.unprivileged_userns_clone = 1
> $ mkdir -p A/B/Secret
> $ sudo mount -t tmpfs hide A/B
>
>
> "Secret" is indeed hidden as expected:
>
> $ ls -lR A
> A:
> total 0
> drwxrwxrwt 2 root root 40 Feb 12 21:08 B
>
> A/B:
> total 0
>
>
> The attack revealing "Secret":
>
> $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A"
> /proc/self/fd/4/:
> total 0
> drwxr-xr-x 3 root root 60 Feb 12 21:08 B
>
> /proc/self/fd/4/B:
> total 0
> drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret
>
> /proc/self/fd/4/B/Secret:
> total 0
I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and
disconnecting all of the mounts in a mount namespace. Fix this by
factoring drop_mounts out of drop_collected_mounts and passing
0 instead of UMOUNT_SYNC.
There are two possible behavior differences that result from this.
- No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on
the vfsmounts being unmounted. This effects the lazy rcu walk by
kicking the walk out of rcu mode and forcing it to be a non-lazy
walk.
- No longer disconnecting locked mounts will keep some mounts around
longer as they stay because the are locked to other mounts.
There are only two users of drop_collected mounts: audit_tree.c and
put_mnt_ns.
In audit_tree.c the mounts are private and there are no rcu lazy walks
only calls to iterate_mounts. So the changes should have no effect
except for a small timing effect as the connected mounts are disconnected.
In put_mnt_ns there may be references from process outside the mount
namespace to the mounts. So the mounts remaining connected will
be the bug fix that is needed. That rcu walks are allowed to continue
appears not to be a problem especially as the rcu walk change was about
an implementation detail not about semantics.
Cc: stable@vger.kernel.org
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Reported-by: Timothy Baldwin <timbaldwin@fastmail.co.uk>
Tested-by: Timothy Baldwin <timbaldwin@fastmail.co.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df7342b240 upstream.
Jonathan Calmels from NVIDIA reported that he's able to bypass the
mount visibility security check in place in the Linux kernel by using
a combination of the unbindable property along with the private mount
propagation option to allow a unprivileged user to see a path which
was purposefully hidden by the root user.
Reproducer:
# Hide a path to all users using a tmpfs
root@castiana:~# mount -t tmpfs tmpfs /sys/devices/
root@castiana:~#
# As an unprivileged user, unshare user namespace and mount namespace
stgraber@castiana:~$ unshare -U -m -r
# Confirm the path is still not accessible
root@castiana:~# ls /sys/devices/
# Make /sys recursively unbindable and private
root@castiana:~# mount --make-runbindable /sys
root@castiana:~# mount --make-private /sys
# Recursively bind-mount the rest of /sys over to /mnnt
root@castiana:~# mount --rbind /sys/ /mnt
# Access our hidden /sys/device as an unprivileged user
root@castiana:~# ls /mnt/devices/
breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe
LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system
tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual
Solve this by teaching copy_tree to fail if a mount turns out to be
both unbindable and locked.
Cc: stable@vger.kernel.org
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Reported-by: Jonathan Calmels <jcalmels@nvidia.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25d202ed82 upstream.
It was recently pointed out that the one instance of testing MNT_LOCKED
outside of the namespace_sem is in ksys_umount.
Fix that by adding a test inside of do_umount with namespace_sem and
the mount_lock held. As it helps to fail fails the existing test is
maintained with an additional comment pointing out that it may be racy
because the locks are not held.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>