Commit Graph

985817 Commits

Author SHA1 Message Date
Quentin Perret
f172ce09be FROMGIT: KVM: arm64: pkvm: Refcount the pages shared with EL2
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com
(cherry picked from commit a83e2191b7
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: I58f39d139c84acb545676c99144547cd080cdda3
2022-01-06 22:15:37 +00:00
Quentin Perret
b1ba2ab991 BACKPORT: FROMGIT: KVM: arm64: Introduce kvm_share_hyp()
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.

[ qperret: BACKPORT due to resolution in kvm_arch_vcpu_run_map_fp() in
  merge commit 2d761dbf7f ("Merge branch kvm-arm64/fpsimd-tracking
  into kvmarm-master/next") ]

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
(cherry picked from commit 3f868e142c
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: Id3ba033a51550e5aada862d6af06773d0be833fb
2022-01-06 22:15:37 +00:00
Will Deacon
af2a0836d6 FROMGIT: KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-6-qperret@google.com
(cherry picked from commit 82bb02445d
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: I7c8dababccaa31a7b6d90977d583c8c0f8b8d47a
2022-01-06 22:15:37 +00:00
Will Deacon
e22f84251d FROMGIT: KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.

Wire up this callback for the hypervisor stage-1 page-table.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com
(cherry picked from commit 34ec7cbf1e
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: I481b1f722f4dddda72e35481aca725a937a52090
2022-01-06 22:15:36 +00:00
Quentin Perret
c3a5cad5ec BACKPORT: FROMGIT: KVM: arm64: Fixup hyp stage-1 refcount
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

[ qperret: BACKPORT due to conflict with S2MPU code ]

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com
(cherry picked from commit d6b4bd3f48
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: I6c48e4e5a0ae969e1d1ac6be0ce601faf138cfd3
2022-01-06 22:15:36 +00:00
Quentin Perret
1e0922b341 FROMGIT: KVM: arm64: Refcount hyp stage-1 pgtable pages
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com
(cherry picked from commit 2ea2ff91e8
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: Ic4ce9177a4381a01e1f2c3b00e9f376e14015718
2022-01-06 22:15:36 +00:00
Quentin Perret
10bcef57f3 FROMGIT: KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.

In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com
(cherry picked from commit 1fac3cfb9c
 git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next)
Bug: 209599700
Change-Id: I9602cb4db9cef6f52e8beefa52e53cf71f2e0cd1
2022-01-06 22:15:36 +00:00
Quentin Perret
ecc14b0dae Revert "FROMLIST: KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator"
This reverts commit b66c10e133.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I6e353b0243a86e5590ac0533e29f90dde9a418c4
2022-01-06 22:15:36 +00:00
Quentin Perret
d76103f791 Revert "FROMLIST: KVM: arm64: Refcount hyp stage-1 pgtable pages"
This reverts commit e96c599591.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I152dd0611638f7ceab361b586f3fa93172f17c9e
2022-01-06 22:15:36 +00:00
Quentin Perret
692851a12a Revert "BACKPORT: FROMLIST: KVM: arm64: Fixup hyp stage-1 refcount"
This reverts commit edffd3888c.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I86105b5a7baeb1223501457666f0628fa25b55b8
2022-01-06 22:15:36 +00:00
Quentin Perret
e5432b22e1 Revert "FROMLIST: KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table"
This reverts commit 3239508319.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I1cb6f413cf09a3b56a6e4fd2b83d28bfdc59984f
2022-01-06 22:15:36 +00:00
Quentin Perret
b9efceca10 Revert "FROMLIST: KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2"
This reverts commit 446ab9f9b4.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Icae6c1df119bb9ba757c8c53e07729e7e7f4f16f
2022-01-06 22:15:35 +00:00
Quentin Perret
53d2e70667 Revert "FROMLIST: KVM: arm64: Introduce kvm_share_hyp()"
This reverts commit 21fb63c709.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I042fbb20865571916ba90684ffbbefb980bfa0ef
2022-01-06 22:15:35 +00:00
Quentin Perret
2c2cdc0358 Revert "FROMLIST: KVM: arm64: pkvm: Refcount the pages shared with EL2"
This reverts commit 33fa24cc3b.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I66488e19919bc3714c54874ac1918e8a1fd119b4
2022-01-06 22:15:35 +00:00
Quentin Perret
883baeb55a Revert "FROMLIST: KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages"
This reverts commit f80bdbb276.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I81ee72a29d8884474b17e944183e9d3372dfdab8
2022-01-06 22:15:35 +00:00
Quentin Perret
a5fa58c648 Revert "BACKPORT: FROMLIST: KVM: arm64: Introduce wrappers for host and hyp spin lock accessors"
This reverts commit fb29cc8de3.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ia67b33b36bcaf49a77b654ff62fbd604eaa959fc
2022-01-06 22:15:35 +00:00
Quentin Perret
654cfb96f3 Revert "FROMLIST: KVM: arm64: Implement do_share() helper for sharing memory"
This reverts commit 455e17002b.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibbcf494525b1ee746e5e2019172db66edc7276e9
2022-01-06 22:15:35 +00:00
Quentin Perret
3531ef4fff Revert "BACKPORT: FROMLIST: KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()"
This reverts commit 50e7557b36.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iad125052e89012b0ad374a2874a83ef608e86141
2022-01-06 22:15:35 +00:00
Quentin Perret
60fd78d4ca Revert "FROMLIST: KVM: arm64: Implement do_unshare() helper for unsharing memory"
This reverts commit 0a4821ecc2.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I199989c20d63e61f167fa55f270147e10cc64016
2022-01-06 22:15:34 +00:00
Quentin Perret
870805293f Revert "FROMLIST: KVM: arm64: Expose unshare hypercall to the host"
This reverts commit fee11d0f41.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4b61e149f82874557904f6e6b09421f294f996fe
2022-01-06 22:15:34 +00:00
Quentin Perret
5214f6c1c4 Revert "FROMLIST: KVM: arm64: pkvm: Unshare guest structs during teardown"
This reverts commit bcf3fd91be.

This will be replaced by a FROMGIT patch shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I51bd7eff34a9345275ca0957082456b5ec5c9d3f
2022-01-06 22:15:34 +00:00
Suren Baghdasaryan
7237abf9f8 ANDROID: mm/oom_kill: allow process_mrelease reclaim memory in parallel with exit_mmap
To allow process_mrelease to reap targeted mm in parallel with exit_mmap
mark the victim with MMF_OOM_VICTIM flag.

Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I89cf5f8fbeeb18b93a340b9ebe7f200837ebe846
2022-01-06 17:49:03 +00:00
Suren Baghdasaryan
01e44cb8bd FROMLIST: mm/oom_kill: allow process_mrelease to run under mmap_lock protection
With exit_mmap holding mmap_write_lock during free_pgtables call,
process_mrelease does not need to elevate mm->mm_users in order to
prevent exit_mmap from destrying pagetables while __oom_reap_task_mm
is walking the VMA tree. The change prevents process_mrelease from
calling the last mmput, which can lead to waiting for IO completion
in exit_aio.

Fixes: 337546e83f ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Link: https://lore.kernel.org/all/20211124235906.14437-2-surenb@google.com/

Bug: 130172058
Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I1e2728e0c477af9cc20e9e0b715ee67dee760618
2022-01-06 17:48:49 +00:00
Suren Baghdasaryan
31ed633858 FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap
oom-reaper and process_mrelease system call should protect against
races with exit_mmap which can destroy page tables while they
walk the VMA tree. oom-reaper protects from that race by setting
MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP
before taking and releasing mmap_write_lock. process_mrelease has
to elevate mm->mm_users to prevent such race. Both oom-reaper and
process_mrelease hold mmap_read_lock when walking the VMA tree.
The locking rules and mechanisms could be simpler if exit_mmap takes
mmap_write_lock while executing destructive operations such as
free_pgtables.
Change exit_mmap to hold the mmap_write_lock when calling
free_pgtables. Operations like unmap_vmas() and unlock_range() are not
destructive and could run under mmap_read_lock but for simplicity we
take one mmap_write_lock during almost the entire operation. Note
also that because oom-reaper checks VM_LOCKED flag, unlock_range()
should not be allowed to race with it.
In most cases this lock should be uncontended. Previously, Kirill
reported ~4% regression caused by a similar change [1]. We reran the
same test and although the individual results are quite noisy, the
percentiles show lower regression with 1.6% being the worst case [2].
The change allows oom-reaper and process_mrelease to execute safely
under mmap_read_lock without worries that exit_mmap might destroy page
tables from under them.

[1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/
[2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Link: https://lore.kernel.org/all/20211124235906.14437-1-surenb@google.com/

Bug: 130172058
Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic87272d09a0b68a1b0e968e8f1a1510fd6fc776a
2022-01-06 17:48:40 +00:00
Suren Baghdasaryan
eb628f5264 UPSTREAM: mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
Race between process_mrelease and exit_mmap, where free_pgtables is
called while __oom_reap_task_mm is in progress, leads to kernel crash
during pte_offset_map_lock call.  oom-reaper avoids this race by setting
MMF_OOM_VICTIM flag and causing exit_mmap to take and release
mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock.

Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to
fix this race, however that would be considered a hack.  Fix this race
by elevating mm->mm_users and preventing exit_mmap from executing until
process_mrelease is finished.  Patch slightly refactors the code to
adapt for a possible mmget_not_zero failure.

This fix has considerable negative impact on process_mrelease
performance and will likely need later optimization.

Link: https://lkml.kernel.org/r/20211022014658.263508-1-surenb@google.com
Fixes: 884a7e5964 ("mm: introduce process_mrelease system call")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christian Brauner <christian@brauner.io>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 337546e83f)

Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I7cf9c869faa7b746995a94ea93f6a617104385aa
2022-01-06 17:48:31 +00:00
Suren Baghdasaryan
11a2f368e4 UPSTREAM: mm: wire up syscall process_mrelease
Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20210809185259.405936-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit dce4910396)

Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6f02c1ec136a7e102f133ee46a7070a151781345
2022-01-06 17:48:14 +00:00
Suren Baghdasaryan
7167628e4c UPSTREAM: mm: introduce process_mrelease system call
In modern systems it's not unusual to have a system component monitoring
memory conditions of the system and tasked with keeping system memory
pressure under control.  One way to accomplish that is to kill
non-essential processes to free up memory for more important ones.
Examples of this are Facebook's OOM killer daemon called oomd and
Android's low memory killer daemon called lmkd.

For such system component it's important to be able to free memory quickly
and efficiently.  Unfortunately the time process takes to free up its
memory after receiving a SIGKILL might vary based on the state of the
process (uninterruptible sleep), size and OPP level of the core the
process is running.  A mechanism to free resources of the target process
in a more predictable way would improve system's ability to control its
memory pressure.

Introduce process_mrelease system call that releases memory of a dying
process from the context of the caller.  This way the memory is freed in a
more controllable way with CPU affinity and priority of the caller.  The
workload of freeing the memory will also be charged to the caller.  The
operation is allowed only on a dying process.

After previous discussions [1, 2, 3] the decision was made [4] to
introduce a dedicated system call to cover this use case.

The API is as follows,

          int process_mrelease(int pidfd, unsigned int flags);

        DESCRIPTION
          The process_mrelease() system call is used to free the memory of
          an exiting process.

          The pidfd selects the process referred to by the PID file
          descriptor.
          (See pidfd_open(2) for further information)

          The flags argument is reserved for future use; currently, this
          argument must be specified as 0.

        RETURN VALUE
          On success, process_mrelease() returns 0. On error, -1 is
          returned and errno is set to indicate the error.

        ERRORS
          EBADF  pidfd is not a valid PID file descriptor.

          EAGAIN Failed to release part of the address space.

          EINTR  The call was interrupted by a signal; see signal(7).

          EINVAL flags is not 0.

          EINVAL The memory of the task cannot be released because the
                 process is not exiting, the address space is shared
                 with another live process or there is a core dump in
                 progress.

          ENOSYS This system call is not supported, for example, without
                 MMU support built into Linux.

          ESRCH  The target process does not exist (i.e., it has terminated
                 and been waited on).

[1] https://lore.kernel.org/lkml/20190411014353.113252-3-surenb@google.com/
[2] https://lore.kernel.org/linux-api/20201113173448.1863419-1-surenb@google.com/
[3] https://lore.kernel.org/linux-api/20201124053943.1684874-3-surenb@google.com/
[4] https://lore.kernel.org/linux-api/20201223075712.GA4719@lst.de/

Link: https://lkml.kernel.org/r/20210809185259.405936-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 884a7e5964)

Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I60d37051acaeff1b7eb7d10aeca23dfa1f2469a3
2022-01-06 17:48:02 +00:00
Peter Collingbourne
2f0e90e9ec Revert "FROMGIT: mm: improve mprotect(R|W) efficiency on pages referenced once"
This reverts commit b44e46bb04.

Reason for revert:

The patch has not yet landed upstream, following feedback from Linus:
https://lore.kernel.org/all/CAHk-=wj4KCujAH_oPh40Bkp48amM4MXr+8AcbZ=qd5LF4Q+TDg@mail.gmail.com/#t

Bug: 213339151
Signed-off-by: Peter Collingbourne <pcc@google.com>
Change-Id: I81c2cef4076487df1dd0ee75449dcb2371ac1dbc
2022-01-06 17:18:46 +00:00
Greg Kroah-Hartman
42c469a083 Merge 5.10.90 into android13-5.10
Changes in 5.10.90
	Input: i8042 - add deferred probe support
	Input: i8042 - enable deferred probe quirk for ASUS UM325UA
	tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().
	tomoyo: use hwight16() in tomoyo_domain_quota_is_ok()
	parisc: Clear stale IIR value on instruction access rights trap
	platform/x86: apple-gmux: use resource_size() with res
	memblock: fix memblock_phys_alloc() section mismatch error
	recordmcount.pl: fix typo in s390 mcount regex
	selinux: initialize proto variable in selinux_ip_postroute_compat()
	scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
	net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
	net/mlx5e: Wrap the tx reporter dump callback to extract the sq
	net/mlx5e: Fix ICOSQ recovery flow for XSK
	udp: using datalen to cap ipv6 udp max gso segments
	selftests: Calculate udpgso segment count without header adjustment
	sctp: use call_rcu to free endpoint
	net/smc: fix using of uninitialized completions
	net: usb: pegasus: Do not drop long Ethernet frames
	net: ag71xx: Fix a potential double free in error handling paths
	net: lantiq_xrx200: fix statistics of received bytes
	NFC: st21nfca: Fix memory leak in device probe and remove
	net/smc: improved fix wait on already cleared link
	net/smc: don't send CDC/LLC message if link not ready
	net/smc: fix kernel panic caused by race of smc_sock
	igc: Fix TX timestamp support for non-MSI-X platforms
	ionic: Initialize the 'lif->dbid_inuse' bitmap
	net/mlx5e: Fix wrong features assignment in case of error
	selftests/net: udpgso_bench_tx: fix dst ip argument
	net/ncsi: check for error return from call to nla_put_u32
	fsl/fman: Fix missing put_device() call in fman_port_probe
	i2c: validate user data in compat ioctl
	nfc: uapi: use kernel size_t to fix user-space builds
	uapi: fix linux/nfc.h userspace compilation errors
	drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
	drm/amdgpu: add support for IP discovery gc_info table v2
	xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
	usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
	usb: mtu3: add memory barrier before set GPD's HWO
	usb: mtu3: fix list_head check warning
	usb: mtu3: set interval of FS intr and isoc endpoint
	binder: fix async_free_space accounting for empty parcels
	scsi: vmw_pvscsi: Set residual data length conditionally
	Input: appletouch - initialize work before device registration
	Input: spaceball - fix parsing of movement data packets
	net: fix use-after-free in tw_timer_handler
	perf script: Fix CPU filtering of a script's switch events
	bpf: Add kconfig knob for disabling unpriv bpf by default
	Linux 5.10.90

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I277de1626e4275a3c3b1294a2a106e473a62de6c
2022-01-06 08:31:22 +01:00
Suren Baghdasaryan
8fded571e7 FROMGIT: mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%
For embedded systems with low total memory, having to run applications
with relatively large memory requirements, 10% max limitation for
watermark_scale_factor poses an issue of triggering direct reclaim every
time such application is started.  This results in slow application
startup times and bad end-user experience.

By increasing watermark_scale_factor max limit we allow vendors more
flexibility to choose the right level of kswapd aggressiveness for their
device and workload requirements.

Link: https://lkml.kernel.org/r/20211124193604.2758863-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Fengfei Xi <xi.fengfei@h3c.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 4e36dc369cc7581ac19a7523303e682a53e52e59
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master)

Bug: 194652782
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I3e926c8b222933a10c79068d22a1407ff3181824
2022-01-05 18:30:26 +00:00
Suren Baghdasaryan
8ecc974abe Revert "ANDROID: add extra free kbytes tunable"
This reverts commit 92501cb670.
Removing out-of-tree patch.

Bug: 109664768
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I31d831dad0716381b1e477f1f11c758135a5fde5
2022-01-05 18:30:19 +00:00
Suren Baghdasaryan
b219d099aa Revert "ANDROID: mm: fix up removal of vm_total_pages problem"
This reverts commit aa9752dec8.
Removing out-of-tree patch.

Bug: 109664768
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I1f7d83697d08067e75a4b575a669b4dfe49eed59
2022-01-05 18:30:12 +00:00
Eric Biggers
04e49b41be ANDROID: fips140: add show_invalid_inputs command to fips140_lab_util
Add a new fips140_lab_util command 'show_invalid_inputs' which uses
AF_ALG to call some crypto algorithms with invalid parameters to show
that they fail.  This is needed to meet a new requirement we've received
from the lab.  This requirement is vague, but a representative sample of
algorithms and inputs appears to be acceptable.

For this to work, AF_ALG needs to be enabled in the kernel.  This makes
fips140_lab_util start depending on a custom kernel build, not just on a
custom fips140 module build as was the case before.  However, the lab
testing was going to need custom boot images anyway once fips140.ko is
included in the normal builds, since the production build of fips140.ko
won't have CONFIG_CRYPTO_FIPS140_MOD_EVAL_TESTING=y.  AF_ALG is also
needed to do the Jitter RNG entropy analysis properly, and the
AF_ALG-enabled kernel can also be reused for ACVP testing.

Bug: 188620248
Change-Id: I69054eab5005fc3ca0ea081760877f73ea229f5b
Signed-off-by: Eric Biggers <ebiggers@google.com>
2022-01-05 15:33:15 +00:00
Eric Biggers
6ed33b82ea ANDROID: fips140: refactor and rename fips140_lab_test
fips140_lab_test doesn't really do any tests per se, but rather is a
utility program that dumps some output.  The actual "test" is when the
lab checks the output; we aren't allowed to check it ourselves.

We also need to add some new functionality, which would work well as
sub-commands.  Also, the original idea was that this was just sample
code which the lab would modify, but that's not actually happening.

Therefore, rename fips140_lab_test to fips140_lab_util, and refactor its
functionality into sub-commands 'show_module_version' and
'show_service_indicators'.  This fits better with what is needed.

Bug: 188620248
Change-Id: I7da84a139283f185f79b8d866547151169f26415
Signed-off-by: Eric Biggers <ebiggers@google.com>
2022-01-05 15:33:06 +00:00
Nick Desaulniers
2728a95f3f ANDROID: remove stale variables from build.config files
There is no need for CROSS_COMPILE_COMPAT as of v5.16-rc1 via
commit 3e6f8d1fa1 ("arm64: vdso32: require CROSS_COMPILE_COMPAT for gcc+bfd")
when LLVM=1 is set. This was backported to v5.10.89.

LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN was a temporary dependency until
LLVM_IAS=1 was enabled for all architectures. With LLVM_IAS=1 (implied
by LLVM=1), nothing in this directory is used and the variable can be
removed.

It doesn't hurt to respecify any of the above, but they are no longer
necessary.

We may be able to further simplify the usage of LLVM_IAS=1 and
CROSS_COMPILE in android13-5.10 depending on whether the following get
backported in the future:

commit f12b034afe ("scripts/Makefile.clang: default to LLVM_IAS=1").
commit 231ad7f409 ("Makefile: infer --target from ARCH for CC=clang")

Bug: 209655537
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: I361b05ea9f36da933e6712150650f476b093d0a7
2022-01-05 14:34:25 +00:00
Greg Kroah-Hartman
d3e491a20d Linux 5.10.90
Link: https://lore.kernel.org/r/20220103142053.466768714@linuxfoundation.org
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20220104073841.681360658@linuxfoundation.org
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:34 +01:00
Daniel Borkmann
8c15bfb36a bpf: Add kconfig knob for disabling unpriv bpf by default
commit 08389d8882 upstream.

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:34 +01:00
Adrian Hunter
d8a5b1377b perf script: Fix CPU filtering of a script's switch events
commit 5e0c325cdb upstream.

CPU filtering was not being applied to a script's switch events.

Fixes: 5bf83c29a0 ("perf script: Add scripting operation process_switch()")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211215080636.149562-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:34 +01:00
Muchun Song
2386e81a1d net: fix use-after-free in tw_timer_handler
commit e22e45fc9e upstream.

A real world panic issue was found as follow in Linux 5.4.

    BUG: unable to handle page fault for address: ffffde49a863de28
    PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0
    RIP: 0010:tw_timer_handler+0x20/0x40
    Call Trace:
     <IRQ>
     call_timer_fn+0x2b/0x120
     run_timer_softirq+0x1ef/0x450
     __do_softirq+0x10d/0x2b8
     irq_exit+0xc7/0xd0
     smp_apic_timer_interrupt+0x68/0x120
     apic_timer_interrupt+0xf/0x20

This issue was also reported since 2017 in the thread [1],
unfortunately, the issue was still can be reproduced after fixing
DCCP.

The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net
namespace is destroyed since tcp_sk_ops is registered befrore
ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops
in the list of pernet_list. There will be a use-after-free on
net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net
if there are some inflight time-wait timers.

This bug is not introduced by commit f2bf415cfe ("mib: add net to
NET_ADD_STATS_BH") since the net_statistics is a global variable
instead of dynamic allocation and freeing. Actually, commit
61a7e26028 ("mib: put net statistics on struct net") introduces
the bug since it put net statistics on struct net and free it when
net namespace is destroyed.

Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug
and replace pr_crit() with panic() since continuing is meaningless
when init_ipv4_mibs() fails.

[1] https://groups.google.com/g/syzkaller/c/p1tn-_Kc6l4/m/smuL_FMAAgAJ?pli=1

Fixes: 61a7e26028 ("mib: put net statistics on struct net")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211228104145.9426-1-songmuchun@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Leo L. Schwab
34087cf960 Input: spaceball - fix parsing of movement data packets
commit bc7ec91718 upstream.

The spaceball.c module was not properly parsing the movement reports
coming from the device.  The code read axis data as signed 16-bit
little-endian values starting at offset 2.

In fact, axis data in Spaceball movement reports are signed 16-bit
big-endian values starting at offset 3.  This was determined first by
visually inspecting the data packets, and later verified by consulting:
http://spacemice.org/pdf/SpaceBall_2003-3003_Protocol.pdf

If this ever worked properly, it was in the time before Git...

Signed-off-by: Leo L. Schwab <ewhac@ewhac.org>
Link: https://lore.kernel.org/r/20211221101630.1146385-1-ewhac@ewhac.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Pavel Skripkin
9f329d0d6c Input: appletouch - initialize work before device registration
commit 9f3ccdc3f6 upstream.

Syzbot has reported warning in __flush_work(). This warning is caused by
work->func == NULL, which means missing work initialization.

This may happen, since input_dev->close() calls
cancel_work_sync(&dev->work), but dev->work initalization happens _after_
input_register_device() call.

So this patch moves dev->work initialization before registering input
device

Fixes: 5a6eb676d3 ("Input: appletouch - improve powersaving for Geyser3 devices")
Reported-and-tested-by: syzbot+b88c5eae27386b252bbd@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20211230141151.17300-1-paskripkin@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Alexey Makhalov
2a4f551dec scsi: vmw_pvscsi: Set residual data length conditionally
commit 142c779d05 upstream.

The PVSCSI implementation in the VMware hypervisor under specific
configuration ("SCSI Bus Sharing" set to "Physical") returns zero dataLen
in the completion descriptor for READ CAPACITY(16). As a result, the kernel
can not detect proper disk geometry. This can be recognized by the kernel
message:

  [ 0.776588] sd 1:0:0:0: [sdb] Sector size 0 reported, assuming 512.

The PVSCSI implementation in QEMU does not set dataLen at all, keeping it
zeroed. This leads to a boot hang as was reported by Shmulik Ladkani.

It is likely that the controller returns the garbage at the end of the
buffer. Residual length should be set by the driver in that case. The SCSI
layer will erase corresponding data. See commit bdb2b8cab4 ("[SCSI] erase
invalid data returned by device") for details.

Commit e662502b3a ("scsi: vmw_pvscsi: Set correct residual data length")
introduced the issue by setting residual length unconditionally, causing
the SCSI layer to erase the useful payload beyond dataLen when this value
is returned as 0.

As a result, considering existing issues in implementations of PVSCSI
controllers, we do not want to call scsi_set_resid() when dataLen ==
0. Calling scsi_set_resid() has no effect if dataLen equals buffer length.

Link: https://lore.kernel.org/lkml/20210824120028.30d9c071@blondie/
Link: https://lore.kernel.org/r/20211220190514.55935-1-amakhalov@vmware.com
Fixes: e662502b3a ("scsi: vmw_pvscsi: Set correct residual data length")
Cc: Matt Wang <wwentao@vmware.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Vishal Bhakta <vbhakta@vmware.com>
Cc: VMware PV-Drivers <pv-drivers@vmware.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Todd Kjos
1cb8444f31 binder: fix async_free_space accounting for empty parcels
commit cfd0d84ba2 upstream.

In 4.13, commit 74310e06be ("android: binder: Move buffer out of area shared with user space")
fixed a kernel structure visibility issue. As part of that patch,
sizeof(void *) was used as the buffer size for 0-length data payloads so
the driver could detect abusive clients sending 0-length asynchronous
transactions to a server by enforcing limits on async_free_size.

Unfortunately, on the "free" side, the accounting of async_free_space
did not add the sizeof(void *) back. The result was that up to 8-bytes of
async_free_space were leaked on every async transaction of 8-bytes or
less.  These small transactions are uncommon, so this accounting issue
has gone undetected for several years.

The fix is to use "buffer_size" (the allocated buffer size) instead of
"size" (the logical buffer size) when updating the async_free_space
during the free operation. These are the same except for this
corner case of asynchronous transactions with payloads < 8 bytes.

Fixes: 74310e06be ("android: binder: Move buffer out of area shared with user space")
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable@vger.kernel.org # 4.14+
Link: https://lore.kernel.org/r/20211220190150.2107077-1-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Chunfeng Yun
a6e26251dd usb: mtu3: set interval of FS intr and isoc endpoint
commit 43f3b8cbcf upstream.

Add support to set interval also for FS intr and isoc endpoint.

Fixes: 4d79e042ed ("usb: mtu3: add support for usb3.1 IP")
Cc: stable@vger.kernel.org
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/20211218095749.6250-4-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Chunfeng Yun
3b6efe0b7b usb: mtu3: fix list_head check warning
commit 8c313e3bfd upstream.

This is caused by uninitialization of list_head.

BUG: KASAN: use-after-free in __list_del_entry_valid+0x34/0xe4

Call trace:
dump_backtrace+0x0/0x298
show_stack+0x24/0x34
dump_stack+0x130/0x1a8
print_address_description+0x88/0x56c
__kasan_report+0x1b8/0x2a0
kasan_report+0x14/0x20
__asan_load8+0x9c/0xa0
__list_del_entry_valid+0x34/0xe4
mtu3_req_complete+0x4c/0x300 [mtu3]
mtu3_gadget_stop+0x168/0x448 [mtu3]
usb_gadget_unregister_driver+0x204/0x3a0
unregister_gadget_item+0x44/0xa4

Fixes: 83374e035b ("usb: mtu3: add tracepoints to help debug")
Cc: stable@vger.kernel.org
Reported-by: Yuwen Ng <yuwen.ng@mediatek.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/20211218095749.6250-3-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Chunfeng Yun
f10b01c48f usb: mtu3: add memory barrier before set GPD's HWO
commit a7aae769ca upstream.

There is a seldom issue that the controller access invalid address
and trigger devapc or emimpu violation. That is due to memory access
is out of order and cause gpd data is not correct.
Add mb() to prohibit compiler or cpu from reordering to make sure GPD
is fully written before setting its HWO.

Fixes: 48e0d3735a ("usb: mtu3: supports new QMU format")
Cc: stable@vger.kernel.org
Reported-by: Eddie Hung <eddie.hung@mediatek.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/20211218095749.6250-2-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:33 +01:00
Vincent Pelletier
1c4ace3e6b usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
commit b1e0887379 upstream.

ffs_data_clear is indirectly called from both ffs_fs_kill_sb and
ffs_ep0_release, so it ends up being called twice when userland closes ep0
and then unmounts f_fs.
If userland provided an eventfd along with function's USB descriptors, it
ends up calling eventfd_ctx_put as many times, causing a refcount
underflow.
NULL-ify ffs_eventfd to prevent these extraneous eventfd_ctx_put calls.

Also, set epfiles to NULL right after de-allocating it, for readability.

For completeness, ffs_data_clear actually ends up being called thrice, the
last call being before the whole ffs structure gets freed, so when this
specific sequence happens there is a second underflow happening (but not
being reported):

/sys/kernel/debug/tracing# modprobe usb_f_fs
/sys/kernel/debug/tracing# echo ffs_data_clear > set_ftrace_filter
/sys/kernel/debug/tracing# echo function > current_tracer
/sys/kernel/debug/tracing# echo 1 > tracing_on
(setup gadget, run and kill function userland process, teardown gadget)
/sys/kernel/debug/tracing# echo 0 > tracing_on
/sys/kernel/debug/tracing# cat trace
 smartcard-openp-436     [000] .....  1946.208786: ffs_data_clear <-ffs_data_closed
 smartcard-openp-431     [000] .....  1946.279147: ffs_data_clear <-ffs_data_closed
 smartcard-openp-431     [000] .n...  1946.905512: ffs_data_clear <-ffs_data_put

Warning output corresponding to above trace:
[ 1946.284139] WARNING: CPU: 0 PID: 431 at lib/refcount.c:28 refcount_warn_saturate+0x110/0x15c
[ 1946.293094] refcount_t: underflow; use-after-free.
[ 1946.298164] Modules linked in: usb_f_ncm(E) u_ether(E) usb_f_fs(E) hci_uart(E) btqca(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) bcm2835_v4l2(CE) bcm2835_mmal_vchiq(CE) videobuf2_vmalloc(E) videobuf2_memops(E) sha512_generic(E) videobuf2_v4l2(E) sha512_arm(E) videobuf2_common(E) videodev(E) cpufreq_dt(E) snd_bcm2835(CE) brcmfmac(E) mc(E) vc4(E) ctr(E) brcmutil(E) snd_soc_core(E) snd_pcm_dmaengine(E) drbg(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) drm_kms_helper(E) cec(E) ansi_cprng(E) rc_core(E) syscopyarea(E) raspberrypi_cpufreq(E) sysfillrect(E) sysimgblt(E) cfg80211(E) max17040_battery(OE) raspberrypi_hwmon(E) fb_sys_fops(E) regmap_i2c(E) ecdh_generic(E) rfkill(E) ecc(E) bcm2835_rng(E) rng_core(E) vchiq(CE) leds_gpio(E) libcomposite(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sdhci_iproc(E) sdhci_pltfm(E) sdhci(E)
[ 1946.399633] CPU: 0 PID: 431 Comm: smartcard-openp Tainted: G         C OE     5.15.0-1-rpi #1  Debian 5.15.3-1
[ 1946.417950] Hardware name: BCM2835
[ 1946.425442] Backtrace:
[ 1946.432048] [<c08d60a0>] (dump_backtrace) from [<c08d62ec>] (show_stack+0x20/0x24)
[ 1946.448226]  r7:00000009 r6:0000001c r5:c04a948c r4:c0a64e2c
[ 1946.458412] [<c08d62cc>] (show_stack) from [<c08d9ae0>] (dump_stack+0x28/0x30)
[ 1946.470380] [<c08d9ab8>] (dump_stack) from [<c0123500>] (__warn+0xe8/0x154)
[ 1946.482067]  r5:c04a948c r4:c0a71dc8
[ 1946.490184] [<c0123418>] (__warn) from [<c08d6948>] (warn_slowpath_fmt+0xa0/0xe4)
[ 1946.506758]  r7:00000009 r6:0000001c r5:c0a71dc8 r4:c0a71e04
[ 1946.517070] [<c08d68ac>] (warn_slowpath_fmt) from [<c04a948c>] (refcount_warn_saturate+0x110/0x15c)
[ 1946.535309]  r8:c0100224 r7:c0dfcb84 r6:ffffffff r5:c3b84c00 r4:c24a17c0
[ 1946.546708] [<c04a937c>] (refcount_warn_saturate) from [<c0380134>] (eventfd_ctx_put+0x48/0x74)
[ 1946.564476] [<c03800ec>] (eventfd_ctx_put) from [<bf5464e8>] (ffs_data_clear+0xd0/0x118 [usb_f_fs])
[ 1946.582664]  r5:c3b84c00 r4:c2695b00
[ 1946.590668] [<bf546418>] (ffs_data_clear [usb_f_fs]) from [<bf547cc0>] (ffs_data_closed+0x9c/0x150 [usb_f_fs])
[ 1946.609608]  r5:bf54d014 r4:c2695b00
[ 1946.617522] [<bf547c24>] (ffs_data_closed [usb_f_fs]) from [<bf547da0>] (ffs_fs_kill_sb+0x2c/0x30 [usb_f_fs])
[ 1946.636217]  r7:c0dfcb84 r6:c3a12260 r5:bf54d014 r4:c229f000
[ 1946.646273] [<bf547d74>] (ffs_fs_kill_sb [usb_f_fs]) from [<c0326d50>] (deactivate_locked_super+0x54/0x9c)
[ 1946.664893]  r5:bf54d014 r4:c229f000
[ 1946.672921] [<c0326cfc>] (deactivate_locked_super) from [<c0326df8>] (deactivate_super+0x60/0x64)
[ 1946.690722]  r5:c2a09000 r4:c229f000
[ 1946.698706] [<c0326d98>] (deactivate_super) from [<c0349a28>] (cleanup_mnt+0xe4/0x14c)
[ 1946.715553]  r5:c2a09000 r4:00000000
[ 1946.723528] [<c0349944>] (cleanup_mnt) from [<c0349b08>] (__cleanup_mnt+0x1c/0x20)
[ 1946.739922]  r7:c0dfcb84 r6:c3a12260 r5:c3a126fc r4:00000000
[ 1946.750088] [<c0349aec>] (__cleanup_mnt) from [<c0143d10>] (task_work_run+0x84/0xb8)
[ 1946.766602] [<c0143c8c>] (task_work_run) from [<c010bdc8>] (do_work_pending+0x470/0x56c)
[ 1946.783540]  r7:5ac3c35a r6:c0d0424c r5:c200bfb0 r4:c200a000
[ 1946.793614] [<c010b958>] (do_work_pending) from [<c01000c0>] (slow_work_pending+0xc/0x20)
[ 1946.810553] Exception stack(0xc200bfb0 to 0xc200bff8)
[ 1946.820129] bfa0:                                     00000000 00000000 000000aa b5e21430
[ 1946.837104] bfc0: bef867a0 00000001 bef86840 00000034 bef86838 bef86790 bef86794 bef867a0
[ 1946.854125] bfe0: 00000000 bef86798 b67b7a1c b6d626a4 60000010 b5a23760
[ 1946.865335]  r10:00000000 r9:c200a000 r8:c0100224 r7:00000034 r6:bef86840 r5:00000001
[ 1946.881914]  r4:bef867a0
[ 1946.888793] ---[ end trace 7387f2a9725b28d0 ]---

Fixes: 5e33f6fdf7 ("usb: gadget: ffs: add eventfd notification about ffs events")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Link: https://lore.kernel.org/r/f79eeea29f3f98de6782a064ec0f7351ad2f598f.1639793920.git.plr.vincent@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:32 +01:00
Mathias Nyman
1933fe8ce7 xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
commit e484409258 upstream.

The Fresco Logic FL1100 controller needs the TRUST_TX_LENGTH quirk like
other Fresco controllers, but should not have the BROKEN_MSI quirks set.

BROKEN_MSI quirk causes issues in detecting usb drives connected to docks
with this FL1100 controller.
The BROKEN_MSI flag was apparently accidentally set together with the
TRUST_TX_LENGTH quirk

Original patch went to stable so this should go there as well.

Fixes: ea0f69d821 ("xhci: Enable trust tx length quirk for Fresco FL11 USB controller")
Cc: stable@vger.kernel.org
cc: Nikolay Martynov <mar.kolya@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20211221112825.54690-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:32 +01:00
Alex Deucher
b8553330a0 drm/amdgpu: add support for IP discovery gc_info table v2
commit 5e713c6afa upstream.

Used on gfx9 based systems. Fixes incorrect CU counts reported
in the kernel log.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1833
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:32 +01:00
chen gong
28863ffe21 drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
commit b7865173cf upstream.

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:32 +01:00