Commit Graph

986458 Commits

Author SHA1 Message Date
SeongJae Park
fe62a24792 UPSTREAM: mm/damon/dbgfs: support multiple contexts
In some use cases, users would want to run multiple monitoring context.
For example, if a user wants a high precision monitoring and dedicating
multiple CPUs for the job is ok, because DAMON creates one monitoring
thread per one context, the user can split the monitoring target regions
into multiple small regions and create one context for each region.  Or,
someone might want to simultaneously monitor different address spaces,
e.g., both virtual address space and physical address space.

The DAMON's API allows such usage, but 'damon-dbgfs' does not.  Therefore,
only kernel space DAMON users can do multiple contexts monitoring.

This commit allows the user space DAMON users to use multiple contexts
monitoring by introducing two new 'damon-dbgfs' debugfs files,
'mk_context' and 'rm_context'.  Users can create a new monitoring context
by writing the desired name of the new context to 'mk_context'.  Then, a
new directory with the name and having the files for setting of the
context ('attrs', 'target_ids' and 'record') will be created under the
debugfs directory.  Writing the name of the context to remove to
'rm_context' will remove the related context and directory.

Link: https://lkml.kernel.org/r/20210716081449.22187-10-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 75c1c2b53c)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: Idba5509b6ef147cd6f8cd820d241718b284a34a3
2022-04-28 23:09:14 +08:00
SeongJae Park
562b676ce9 UPSTREAM: mm/damon/dbgfs: export kdamond pid to the user space
For CPU usage accounting, knowing pid of the monitoring thread could be
helpful.  For example, users could use cpuaccount cgroups with the pid.

This commit therefore exports the pid of currently running monitoring
thread to the user space via 'kdamond_pid' file in the debugfs directory.

Link: https://lkml.kernel.org/r/20210716081449.22187-9-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 429538e854)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I32d41776a1850ed189b2759194061004ca8a83cf
2022-04-28 23:09:14 +08:00
SeongJae Park
c10dc7808e UPSTREAM: mm/damon: implement a debugfs-based user space interface
DAMON is designed to be used by kernel space code such as the memory
management subsystems, and therefore it provides only kernel space API.
That said, letting the user space control DAMON could provide some
benefits to them.  For example, it will allow user space to analyze their
specific workloads and make their own special optimizations.

For such cases, this commit implements a simple DAMON application kernel
module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports
those to the user space via the debugfs.

'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.

Attributes
----------

Users can read and write the ``sampling interval``, ``aggregation
interval``, ``regions update interval``, and min/max number of monitoring
target regions by reading from and writing to the ``attrs`` file.  For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10,
1000 and check it again::

    # cd <debugfs>/damon
    # echo 5000 100000 1000000 10 1000 > attrs
    # cat attrs
    5000 100000 1000000 10 1000

Target IDs
----------

Some types of address spaces supports multiple monitoring target.  For
example, the virtual memory address spaces monitoring can have multiple
processes as the monitoring targets.  Users can set the targets by writing
relevant id values of the targets to, and get the ids of the current
targets by reading from the ``target_ids`` file.  In case of the virtual
address spaces monitoring, the values should be pids of the monitoring
target processes.  For example, below commands set processes having pids
42 and 4242 as the monitoring targets and check it again::

    # cd <debugfs>/damon
    # echo 42 4242 > target_ids
    # cat target_ids
    42 4242

Note that setting the target ids doesn't start the monitoring.

Turning On/Off
--------------

Setting the files as described above doesn't incur effect unless you
explicitly start the monitoring.  You can start, stop, and check the
current status of the monitoring by writing to and reading from the
``monitor_on`` file.  Writing ``on`` to the file starts the monitoring of
the targets with the attributes.  Writing ``off`` to the file stops those.
DAMON also stops if every targets are invalidated (in case of the virtual
memory monitoring, target processes are invalidated when terminated).
Below example commands turn on, off, and check the status of DAMON::

    # cd <debugfs>/damon
    # echo on > monitor_on
    # echo off > monitor_on
    # cat monitor_on
    off

Please note that you cannot write to the above-mentioned debugfs files
while the monitoring is turned on.  If you write to the files while DAMON
is running, an error code such as ``-EBUSY`` will be returned.

[akpm@linux-foundation.org: remove unneeded "alloc failed" printks]
[akpm@linux-foundation.org: replace macro with static inline]

Link: https://lkml.kernel.org/r/20210716081449.22187-8-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 4bc05954d0)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I99664d4372dc5c510d7e16ffe384345e52579b0e
2022-04-28 23:09:14 +08:00
SeongJae Park
3ea808dca1 UPSTREAM: mm/damon: add a tracepoint
This commit adds a tracepoint for DAMON.  It traces the monitoring results
of each region for each aggregation interval.  Using this, DAMON can
easily integrated with tracepoints supporting tools such as perf.

Link: https://lkml.kernel.org/r/20210716081449.22187-7-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 2fcb93629a)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I502a57938f7aa60c55b05525b45e50ddb1aeda65
2022-04-28 23:09:14 +08:00
SeongJae Park
2afdd88030 UPSTREAM: mm/damon: implement primitives for the virtual memory address spaces
This commit introduces a reference implementation of the address space
specific low level primitives for the virtual address space, so that users
of DAMON can easily monitor the data accesses on virtual address spaces of
specific processes by simply configuring the implementation to be used by
DAMON.

The low level primitives for the fundamental access monitoring are defined
in two parts:

1. Identification of the monitoring target address range for the address
   space.
2. Access check of specific address range in the target space.

The reference implementation for the virtual address space does the works
as below.

PTE Accessed-bit Based Access Check
-----------------------------------

The implementation uses PTE Accessed-bit for basic access checks.  That
is, it clears the bit for the next sampling target page and checks whether
it is set again after one sampling period.  This could disturb the reclaim
logic.  DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the
conflict, as Idle page tracking does.

VMA-based Target Address Range Construction
-------------------------------------------

Only small parts in the super-huge virtual address space of the processes
are mapped to physical memory and accessed.  Thus, tracking the unmapped
address regions is just wasteful.  However, because DAMON can deal with
some level of noise using the adaptive regions adjustment mechanism,
tracking every mapping is not strictly required but could even incur a
high overhead in some cases.  That said, too huge unmapped areas inside
the monitoring target should be removed to not take the time for the
adaptive mechanism.

For the reason, this implementation converts the complex mappings to three
distinct regions that cover every mapped area of the address space.  Also,
the two gaps between the three regions are the two biggest unmapped areas
in the given address space.  The two biggest unmapped areas would be the
gap between the heap and the uppermost mmap()-ed region, and the gap
between the lowermost mmap()-ed region and the stack in most of the cases.
Because these gaps are exceptionally huge in usual address spaces,
excluding these will be sufficient to make a reasonable trade-off.  Below
shows this in detail::

    <heap>
    <BIG UNMAPPED REGION 1>
    <uppermost mmap()-ed region>
    (small mmap()-ed regions and munmap()-ed regions)
    <lowermost mmap()-ed region>
    <BIG UNMAPPED REGION 2>
    <stack>

[akpm@linux-foundation.org: mm/damon/vaddr.c needs highmem.h for kunmap_atomic()]
[sjpark@amazon.de: remove unnecessary PAGE_EXTENSION setup]
  Link: https://lkml.kernel.org/r/20210806095153.6444-2-sj38.park@gmail.com
[sjpark@amazon.de: safely walk page table]
  Link: https://lkml.kernel.org/r/20210831161800.29419-1-sj38.park@gmail.com

Link: https://lkml.kernel.org/r/20210716081449.22187-6-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 3f49584b26)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: Ib7d134932b1dd20cd926968991654d71717c197d
2022-04-28 23:09:14 +08:00
SeongJae Park
75e13bad97 UPSTREAM: mm/idle_page_tracking: make PG_idle reusable
PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page
Tracking and the reclaim logic concurrently work while not interfering
with each other.  That is, when they need to clear the Accessed bit, they
set PG_young to represent the previous state of the bit, respectively.
And when they need to read the bit, if the bit is cleared, they further
read the PG_young to know whether the other has cleared the bit meanwhile
or not.

For yet another user of the PTE Accessed bit, we could add another page
flag, or extend the mechanism to use the flags.  For the DAMON usecase,
however, we don't need to do that just yet.  IDLE_PAGE_TRACKING and DAMON
are mutually exclusive, so there's only ever going to be one user of the
current set of flags.

In this commit, we split out the CONFIG options to allow for the use of
PG_young and PG_idle outside of idle page tracking.

In the next commit, DAMON's reference implementation of the virtual memory
address space monitoring primitives will use it.

[sjpark@amazon.de: set PAGE_EXTENSION for non-64BIT]
  Link: https://lkml.kernel.org/r/20210806095153.6444-1-sj38.park@gmail.com
[akpm@linux-foundation.org: tweak Kconfig text]
[sjpark@amazon.de: hide PAGE_IDLE_FLAG from users]
  Link: https://lkml.kernel.org/r/20210813081238.34705-1-sj38.park@gmail.com

Link: https://lkml.kernel.org/r/20210716081449.22187-5-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 1c676e0d9b)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I4b2c3087b069b0306ab8743402fabcd195d63e54
2022-04-28 23:09:14 +08:00
SeongJae Park
0f1bc2a61d UPSTREAM: mm/damon: adaptively adjust regions
Even somehow the initial monitoring target regions are well constructed to
fulfill the assumption (pages in same region have similar access
frequencies), the data access pattern can be dynamically changed.  This
will result in low monitoring quality.  To keep the assumption as much as
possible, DAMON adaptively merges and splits each region based on their
access frequency.

For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small.
Then, after it reports and clears the aggregated access frequency of each
region, it splits each region into two or three regions if the total
number of regions will not exceed the user-specified maximum number of
regions after the split.

In this way, DAMON provides its best-effort quality and minimal overhead
while keeping the upper-bound overhead that users set.

Link: https://lkml.kernel.org/r/20210716081449.22187-4-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit b9a6ac4e4e)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I330416cf16f4e72fd8f26e01d3bd425738216c70
2022-04-28 23:09:14 +08:00
SeongJae Park
488e19fc91 UPSTREAM: mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region.  As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked.  Thus, for each ``sampling interval``,

 1. the 'prepare_access_checks' primitive picks one page in each region,
 2. waits for one ``sampling interval``,
 3. checks whether the page is accessed meanwhile, and
 4. increases the access count of the region if so.

Therefore, the monitoring overhead is controllable by adjusting the
number of regions.  DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off.  In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.

This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.  Next commit will address this problem.

Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit f23b8eee18)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: Iaecd1ea4d5123e7ffc21ca63de0fe3555bab079d
2022-04-28 23:09:14 +08:00
SeongJae Park
bc19dd9a51 UPSTREAM: mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.

Introduction
============

DAMON is a data access monitoring framework for the Linux kernel.  The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it

- accurate (The monitored information is useful for DRAM level memory
  management.  It might not appropriate for Cache-level accuracy,
  though.),

- light-weight (The monitoring overhead is low enough to be applied
  online while making no impact on the performance of the target
  workloads.), and

- scalable (the upper-bound of the instrumentation overhead is
  controllable regardless of the size of target workloads.).

Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns.  Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.

Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem.  Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.

DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].

[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon

The userspace tool[1] is available, released under GPLv2, and actively
being maintained.  I am also planning to implement another basic user
interface in perf[2].  Also, the basic test suite for DAMON is available
under GPLv2[3].

[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests

Long-term Plan
--------------

DAMON is a part of a project called Data Access-aware Operating System
(DAOS).  As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns.  The
optimizations are for both kernel and user spaces.  I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools.  Below shows the layers and
components for the project.

    ---------------------------------------------------------------------------
    Primitives:     PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
    Framework:      DAMON
    Features:       DAMOS, virtual addr, physical addr, ...
    Applications:   DAMON-debugfs, (DARC), ...
    ^^^^^^^^^^^^^^^^^^^^^^^    KERNEL SPACE    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    Raw Interface:  debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...

    vvvvvvvvvvvvvvvvvvvvvvv    USER SPACE      vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    Library:        (libdamon), ...
    Tools:          DAMO, (perf), ...
    ---------------------------------------------------------------------------

The components in parentheses or marked as '...' are not implemented yet
but in the future plan.  IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/

Evaluations
===========

We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.

DAMON is lightweight.  It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.

DAMON is accurate and useful for memory management optimizations.  An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).

NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.

Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.

[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html

Real-world User Story
=====================

In summary, DAMON has used on production systems and proved its usefulness.

DAMON as a profiler
-------------------

We analyzed characteristics of a large scale production systems of our
customers using DAMON.  The systems utilize 70GB DRAM and 36 CPUs.  From
this, we were able to find interesting things below.

There were obviously different access pattern under idle workload and
active workload.  Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.

DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload.  We believe this is the
performance-effective working set and need to be protected.

There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads.  We think this must be a hottest
code section like thing that should never be paged out.

For this analysis, DAMON used only 0.3-1% of single CPU time.  Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes.  This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features.  I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.

DAMON as a system optimization tool
-----------------------------------

We also found below potential performance problems on the systems and made
DAMON-based solutions.

The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device.  However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO.  The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.

Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1].  By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.

We also found the system is having high number of TLB misses.  We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat.  We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.

We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers.  The customers satisfied
about the analysis results and promised to try the optimization guides.

[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/

Comparison with Idle Page Tracking
==================================

Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file.  One
recommended usage of it is working set size detection.  Users can do that
by

    1. find PFN of each page for workloads in interest,
    2. set all the pages as idle by doing writes to the bitmap file,
    3. wait until the workload accesses its working set, and
    4. read the idleness of the pages again and count pages became not idle.

NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space.  Hence, this section only assumes such user space use of DAMON.

For what use cases Idle Page Tracking would be better?
------------------------------------------------------

1. Flexible usecases other than hotness monitoring.

Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want.  Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region.  For this, DAMON asks users to provide sampling
interval and aggregation interval.  For the reason, there could be some
use case that using Idle Page Tracking is simpler.

2. Physical memory monitoring.

Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.

DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.

Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.

Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available.  It also supports user memory same to
Idle Page Tracking.

[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/

For what use cases DAMON is better?
-----------------------------------

1. Hotness Monitoring.

Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.

2. Low Monitoring Overhead

DAMON receives user's monitoring request with one step and then provide
the results.  So, roughly speaking, DAMON require only O(1) user/kernel
context switches.

In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge.  As a result,
the context switch overhead could be not negligible.

Moreover, DAMON is born to handle with the monitoring overhead.  Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON.  Also, the kernel subsystems cannot use the logic in this case.

3. Page granularity working set size detection.

Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions.  So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead.  Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.

All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries.  Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.

There are more details to consider, but roughly speaking, this is true in
most cases.

However, the situation changed from v23.  Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata.  Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch.  A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1].  An RFC patchset for it based on this patchset
will also be available soon.

Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type.  You can
still get it from the DAMON development tree[2], though.

[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master

4. More future usecases

While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces.  If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those.  Therefore, if your
use case could be changed a lot in future, using DAMON could be better.

Can I use both Idle Page Tracking and DAMON?
--------------------------------------------

Yes, though using them concurrently for overlapping memory regions could
result in interference to each other.  Nevertheless, such use case would
be rare or makes no sense at all.  Even in the case, the noise would bot
be really significant.  So, you can choose whatever you want depending on
the characteristics of your use cases.

More Information
================

We prepared a showcase web site[1] that you can get more information.
There are

- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
  heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
  size changes[7], and
- the latest performance test results[8].

[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html

Baseline and Complete Git Trees
===============================

The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm.  You can
also clone the complete git tree:

    $ git clone git://github.com/sjp38/linux -b damon/patches/v34

The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34

Development Trees
-----------------

There are a couple of trees for entire DAMON patchset series and features
for future release.

- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next

Long-term Support Trees
-----------------------

For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.

- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y

Amazon Linux Kernel Trees
-------------------------

DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].

[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon

Git Tree for Diff of Patches
============================

For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches

You can clone it and use 'diff' for easy review of changes between
different versions of the patchset.  For example:

    $ git clone https://github.com/sjp38/damon-patches && cd damon-patches
    $ diff -u damon/v33 damon/v34

Sequence Of Patches
===================

First three patches implement the core logics of DAMON.  The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets.  Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).

Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case.  The
following two patches make it just work for virtual address spaces
monitoring.  The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.

Now DAMON just works for virtual address space monitoring via the kernel
space api.  To let the user space users can use DAMON, following four
patches add interfaces for them.  The 6th patch adds a tracepoint for
monitoring results.  The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface.  The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.

Three patches for maintainability follows.  The 10th patch adds
documentations for both the user space and the kernel space.  The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).

Finally, the last patch (13th) updates the MAINTAINERS file.

This patch (of 13):

DAMON is a data access monitoring framework for the Linux kernel.  The
core mechanisms of DAMON make it

 - accurate (the monitoring output is useful enough for DRAM level
   performance-centric memory management; It might be inappropriate for
   CPU cache levels, though),
 - light-weight (the monitoring overhead is normally low enough to be
   applied online), and
 - scalable (the upper-bound of the overhead is in constant range
   regardless of the size of target workloads).

Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications.  For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.

Due to its simple and flexible interface, providing user space interface
would be also easy.  Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.

===

Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic.  The basic
access check is as below.

The output of DAMON says what memory regions are how frequently accessed
for a given duration.  The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results.  In other words, counts the number of the accesses
to each region.  After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results.  This can be
described in below simple pseudo-code::

    init()
    while monitoring_on:
        for page in monitoring_target:
            if accessed(page):
                nr_accesses[page] += 1
        if time() % aggregation_interval == 0:
            for callback in user_registered_callbacks:
                callback(monitoring_target, nr_accesses)
            for page in monitoring_target:
                nr_accesses[page] = 0
        if time() % update_interval == 0:
            update()
        sleep(sampling interval)

The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug).  The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.

The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON.  Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those.  In other words, users cannot use
current version of DAMON without some additional works.

Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.

Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 2224d84854)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: I578b47551306e4c416e7b97c57335f810783615c
2022-04-28 23:09:14 +08:00
Eric Dumazet
35a697cab4 BACKPORT: net/packet: fix slab-out-of-bounds access in packet_recvmsg()
[ Upstream commit c700525fcc ]

syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH
and mmap operations, tpacket_rcv() is queueing skbs with
garbage in skb->cb[], triggering a too big copy [1]

Presumably, users of af_packet using mmap() already gets correct
metadata from the mapped buffer, we can simply make sure
to clear 12 bytes that might be copied to user space later.

BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline]
BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631

CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b3660695e80 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
 memcpy+0x39/0x60 mm/kasan/shadow.c:66
 memcpy include/linux/fortify-string.h:225 [inline]
 packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_recvmsg net/socket.c:962 [inline]
 ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
 ___sys_recvmsg+0x127/0x200 net/socket.c:2674
 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fdfd5954c29
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29
RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60
R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54
 </TASK>

addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame:
 ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246

this frame has 1 object:
 [32, 160) 'addr'

Memory state around the buggy address:
 ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00
 ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
>ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3
                                                                ^
 ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1
 ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00
==================================================================

Bug: 224546354
Fixes: 0fb375fb9b ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I37e4a05a8d81b2645bc65db002e644b40d1a984d
2022-04-28 13:02:55 +00:00
Miklos Szeredi
24d464d38b BACKPORT: fuse: fix pipe buffer lifetime for direct_io
commit 0c4bcfdecb upstream.

In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.

On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.

This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.

Fix by copying pages coming from the user address space to new pipe
buffers.

Bug: 226679409
Reported-by: Jann Horn <jannh@google.com>
Fixes: c3021629a0 ("fuse: support splice() reading from fuse device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I57a98e96e36bb97ce3e7b1ebf88917c6c8b0247d
2022-04-28 11:33:22 +00:00
Jiazi Li
2e3c211e7e BACKPORT: dm: fix NULL pointer issue when free bio
dm_io_dec_pending call end_io_acct first, will dec md in-flight
pending count. If a task is swapping table at same time.
task1                             task2
do_resume
 ->do_suspend
  ->dm_wait_for_completion
                                  bio_endio
				   ->clone_endio
				    ->dm_io_dec_pending
				     ->end_io_acct
				      ->wakeup task1
 ->dm_swap_table
  ->__bind
   ->__bind_mempools
    ->bioset_exit
     ->mempool_exit
                                     ->free_io
mempool->elements is NULL, and lead to following crash:
[ 67.330330] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
......
[ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 67.330510] pc : mempool_free+0x70/0xa0
[ 67.330515] lr : mempool_free+0x4c/0xa0
[ 67.330520] sp : ffffff8008013b20
[ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
[ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
[ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
[ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
[ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
[ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
[ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
[ 67.330563] x15: 000000000093b41e x14: 0000000000000010
[ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
[ 67.330574] x11: 0000000000000001 x10: 0000000000000001
[ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
[ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
[ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
[ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
[ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
[ 67.330609] Call trace:
[ 67.330616] mempool_free+0x70/0xa0
[ 67.330627] bio_put+0xf8/0x110
[ 67.330638] dec_pending+0x13c/0x230
[ 67.330644] clone_endio+0x90/0x180
[ 67.330649] bio_endio+0x198/0x1b8
[ 67.330655] dec_pending+0x190/0x230
[ 67.330660] clone_endio+0x90/0x180
[ 67.330665] bio_endio+0x198/0x1b8
[ 67.330673] blk_update_request+0x214/0x428
[ 67.330683] scsi_end_request+0x2c/0x300
[ 67.330688] scsi_io_completion+0xa0/0x710
[ 67.330695] scsi_finish_command+0xd8/0x110
[ 67.330700] scsi_softirq_done+0x114/0x148
[ 67.330708] blk_done_softirq+0x74/0xd0
[ 67.330716] __do_softirq+0x18c/0x374
[ 67.330724] irq_exit+0xb4/0xb8
[ 67.330732] __handle_domain_irq+0x84/0xc0
[ 67.330737] gic_handle_irq+0x148/0x1b0
[ 67.330744] el1_irq+0xe8/0x190
[ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
[ 67.330759] cpuidle_enter_state+0x1fc/0x398
[ 67.330764] cpuidle_enter+0x18/0x20
[ 67.330772] do_idle+0x1b4/0x290
[ 67.330778] cpu_startup_entry+0x20/0x28
[ 67.330786] secondary_start_kernel+0x160/0x170

Move end_io_acct after free_io to fix this issue.

Bug: 228982905
Link: https://lore.kernel.org/dm-devel/1632916768-22379-1-git-send-email-lijiazi@xiaomi.com/T/#u
[Akilesh: Resolved merge conflict in drivers/md/dm.c]
Signed-off-by: Jiazi Li <lijiazi@xiaomi.com>
Signed-off-by: Akilesh Kailash <akailash@google.com>
(cherry picked from commit d208b89401)
Change-Id: I9f122cab2af3b961c472b8cf2087399c63c28de1
2022-04-27 22:40:43 +00:00
Marco Elver
aed2e27d51 UPSTREAM: kfence, x86: fix preemptible warning on KPTI-enabled systems
On systems with KPTI enabled, we can currently observe the following
warning:

  BUG: using smp_processor_id() in preemptible
  caller is invalidate_user_asid+0x13/0x50
  CPU: 6 PID: 1075 Comm: dmesg Not tainted 5.12.0-rc4-gda4a2b1a5479-kfence_1+ #1
  Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.11 10/24/2012
  Call Trace:
   dump_stack+0x7f/0xad
   check_preemption_disabled+0xc8/0xd0
   invalidate_user_asid+0x13/0x50
   flush_tlb_one_kernel+0x5/0x20
   kfence_protect+0x56/0x80
   ...

While it normally makes sense to require preemption to be off, so that
the expected CPU's TLB is flushed and not another, in our case it really
is best-effort (see comments in kfence_protect_page()).

Avoid the warning by disabling preemption around flush_tlb_one_kernel().

Link: https://lore.kernel.org/lkml/YGIDBAboELGgMgXy@elver.google.com/
Link: https://lkml.kernel.org/r/20210330065737.652669-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 6a77d38efc)

Bug: 229863099
Signed-off-by: Colin Downs-Razouk <colindr@google.com>
Change-Id: Ia917b052ffbb267254f281f55141c34ad193c78e
2022-04-27 11:12:09 -07:00
Ryun Park
e0513ed978 ANDROID: ABI: Update allowed list for galaxy
Leaf changes summary: 7 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 7 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

7 Added functions:

  [A] 'function int gpiochip_irqchip_add_key(gpio_chip*, irq_chip*, unsigned int, irq_flow_handler_t, unsigned int, bool, lock_class_key*, lock_class_key*)'
  [A] 'function void gpiochip_set_nested_irqchip(gpio_chip*, irq_chip*, unsigned int)'
  [A] 'function usb_request* gs_alloc_req(usb_ep*, unsigned int, gfp_t)'
  [A] 'function void gs_free_req(usb_ep*, usb_request*)'
  [A] 'function void gserial_free_line(unsigned char)'
  [A] 'function void gserial_resume(gserial*)'
  [A] 'function void gserial_suspend(gserial*)'

Bug: 230572486
Change-Id: Ie85627ba247c5bc7f5e9da90934064f9ce5d9ba9
Signed-off-by: Ryun Park <ryun.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2022-04-27 12:54:35 +02:00
Greg Kroah-Hartman
6ed058a9bf ANDROID: abi_gki_aarch64.xml: update based on proper LTO=full setting
Commit 3628acf6b8 ("ANDROID: GKI: Update symbols to
abi_gki_aarch64_oplus") updated the .xml file, but did not do so with
LTO=full so future changes to the symbol list will generate huge churn
in the .xml file masking out any new symbols from the diffstat.

Regenerate the .xml file in order to help clean this up.

Bug: 220957464
Cc: wuzhe <wuzhe@oppo.com>
Fixes: 3628acf6b8 ("ANDROID: GKI: Update symbols to abi_gki_aarch64_oplus")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6784f0414df7af0e6d00d454e47c44c931d529f4
2022-04-27 12:54:28 +02:00
Xie Yongji
46f414b1c2 BACKPORT: virtio-blk: Use blk_validate_block_size() to validate block size
The block layer can't support a block size larger than
page size yet. And a block size that's too small or
not a power of two won't work either. If a misconfigured
device presents an invalid block size in configuration space,
it will result in the kernel crash something like below:

[  506.154324] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  506.160416] RIP: 0010:create_empty_buffers+0x24/0x100
[  506.174302] Call Trace:
[  506.174651]  create_page_buffers+0x4d/0x60
[  506.175207]  block_read_full_page+0x50/0x380
[  506.175798]  ? __mod_lruvec_page_state+0x60/0xa0
[  506.176412]  ? __add_to_page_cache_locked+0x1b2/0x390
[  506.177085]  ? blkdev_direct_IO+0x4a0/0x4a0
[  506.177644]  ? scan_shadow_nodes+0x30/0x30
[  506.178206]  ? lru_cache_add+0x42/0x60
[  506.178716]  do_read_cache_page+0x695/0x740
[  506.179278]  ? read_part_sector+0xe0/0xe0
[  506.179821]  read_part_sector+0x36/0xe0
[  506.180337]  adfspart_check_ICS+0x32/0x320
[  506.180890]  ? snprintf+0x45/0x70
[  506.181350]  ? read_part_sector+0xe0/0xe0
[  506.181906]  bdev_disk_changed+0x229/0x5c0
[  506.182483]  blkdev_get_whole+0x6d/0x90
[  506.183013]  blkdev_get_by_dev+0x122/0x2d0
[  506.183562]  device_add_disk+0x39e/0x3c0
[  506.184472]  virtblk_probe+0x3f8/0x79b [virtio_blk]
[  506.185461]  virtio_dev_probe+0x15e/0x1d0 [virtio]

So let's use a block layer helper to validate the block size.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211026144015.188-5-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 57a13a5b81)
[keirf@: Implement missing error path]
Bug: 226679849
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I78cde1101baf8da2f68d0b9f942a0f1ec89fb30e
(cherry picked from commit 588affc843)
2022-04-26 09:42:49 +00:00
liang zhang
f06daa5a0b ANDROID: add for tuning readahead size
Tune ReadAhead size for better memory usage and performance.
accordding to Read-Ahead Efficiency on Mobile Devices: Observation,
Characterization, and Optimization form IEEE

Bug: 229839032
Change-Id: I91656bde5e616e181fd7557554d55e7ce1858136
Signed-off-by: liang zhang <liang.zhang@transsion.com>
2022-04-24 10:53:17 +00:00
Chen-Yu Tsai
6def3a5ed8 BACKPORT: media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
[ Upstream commit 8310ca9407 ]

DST_QUEUE_OFF_BASE is applied to offset/mem_offset on MMAP capture buffers
only for the VIDIOC_QUERYBUF ioctl, while the userspace fields (including
offset/mem_offset) are filled in for VIDIOC_{QUERY,PREPARE,Q,DQ}BUF
ioctls. This leads to differences in the values presented to userspace.
If userspace attempts to mmap the capture buffer directly using values
from DQBUF, it will fail.

Move the code that applies the magic offset into a helper, and call
that helper from all four ioctl entry points.

[hverkuil: drop unnecessary '= 0' in v4l2_m2m_querybuf() for ret]

Bug: 223375145
Fixes: 7f98639def ("V4L/DVB: add memory-to-memory device helper framework for videobuf")
Fixes: 908a0d7c58 ("[media] v4l: mem2mem: port to videobuf2")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ifab9933f85f4ba7e0fadddf52debaf837830a06d
2022-04-22 07:38:57 +00:00
Johannes Berg
31beefbf14 BACKPORT: nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size
commit 6624bb34b4 upstream.

We need this to be at least two bytes, so we can access
alpha2[0] and alpha2[1]. It may be three in case some
userspace used NUL-termination since it was NLA_STRING
(and we also push it out with NUL-termination).

Cc: stable@vger.kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ib76876c2aa89aacf4c31d95b751f8b2d27788559
2022-04-21 13:49:49 +00:00
Theodore Ts'o
b07625698e BACKPORT: ext4: don't BUG if someone dirty pages without asking ext4 first
[ Upstream commit cc5095747e ]

[un]pin_user_pages_remote is dirtying pages without properly warning
the file system in advance.  A related race was noted by Jan Kara in
2018[1]; however, more recently instead of it being a very hard-to-hit
race, it could be reliably triggered by process_vm_writev(2) which was
discovered by Syzbot[2].

This is technically a bug in mm/gup.c, but arguably ext4 is fragile in
that if some other kernel subsystem dirty pages without properly
notifying the file system using page_mkwrite(), ext4 will BUG, while
other file systems will not BUG (although data will still be lost).

So instead of crashing with a BUG, issue a warning (since there may be
potential data loss) and just mark the page as clean to avoid
unprivileged denial of service attacks until the problem can be
properly fixed.  More discussion and background can be found in the
thread starting at [2].

[1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz
[2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com

Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I974915cfe58a2f773b025db8864f4b7927de2153
2022-04-21 07:53:59 +00:00
wuzhe
3628acf6b8 ANDROID: GKI: Update symbols to abi_gki_aarch64_oplus
Leaf changes summary: 119 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 119 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

119 Added functions:

  [A] 'function int __dquot_alloc_space(inode*, qsize_t, int)'
  [A] 'function void __dquot_free_space(inode*, qsize_t, int)'
  [A] 'function int __dquot_transfer(inode*, dquot**)'
  [A] 'function int __fscrypt_encrypt_symlink(inode*, const char*, unsigned int, fscrypt_str*)'
  [A] 'function bool __fscrypt_inode_uses_inline_crypto(const inode*)'
  [A] 'function int __fscrypt_prepare_link(inode*, inode*, dentry*)'
  [A] 'function int __fscrypt_prepare_lookup(inode*, dentry*, fscrypt_name*)'
  [A] 'function int __fscrypt_prepare_readdir(inode*)'
  [A] 'function int __fscrypt_prepare_rename(inode*, dentry*, inode*, dentry*, unsigned int)'
  [A] 'function int __fscrypt_prepare_setattr(dentry*, iattr*)'
  [A] 'function ssize_t __generic_file_write_iter(kiocb*, iov_iter*)'
  [A] 'function void bio_associate_blkg_from_css(bio*, cgroup_subsys_state*)'
  [A] 'function const char* blk_op_str(unsigned int)'
  [A] 'function int blkdev_issue_zeroout(block_device*, long long unsigned int, long long unsigned int, unsigned int, unsigned int)'
  [A] 'function void clear_nlink(inode*)'
  [A] 'function dentry* d_find_alias(inode*)'
  [A] 'function void d_instantiate_new(dentry*, inode*)'
  [A] 'function void d_invalidate(dentry*)'
  [A] 'function void d_tmpfile(dentry*, inode*)'
  [A] 'function char* dentry_path_raw(dentry*, char*, int)'
  [A] 'function dquot* dqget(super_block*, kqid)'
  [A] 'function void dqput(dquot*)'
  [A] 'function int dquot_acquire(dquot*)'
  [A] 'function dquot* dquot_alloc(super_block*, int)'
  [A] 'function int dquot_alloc_inode(inode*)'
  [A] 'function int dquot_claim_space_nodirty(inode*, qsize_t)'
  [A] 'function int dquot_commit(dquot*)'
  [A] 'function int dquot_commit_info(super_block*, int)'
  [A] 'function void dquot_destroy(dquot*)'
  [A] 'function int dquot_disable(super_block*, int, unsigned int)'
  [A] 'function void dquot_drop(inode*)'
  [A] 'function int dquot_file_open(inode*, file*)'
  [A] 'function void dquot_free_inode(inode*)'
  [A] 'function int dquot_get_dqblk(super_block*, kqid, qc_dqblk*)'
  [A] 'function int dquot_get_next_dqblk(super_block*, kqid*, qc_dqblk*)'
  [A] 'function int dquot_get_next_id(super_block*, kqid*)'
  [A] 'function int dquot_get_state(super_block*, qc_state*)'
  [A] 'function int dquot_initialize(inode*)'
  [A] 'function bool dquot_initialize_needed(inode*)'
  [A] 'function int dquot_load_quota_inode(inode*, int, int, unsigned int)'
  [A] 'function int dquot_mark_dquot_dirty(dquot*)'
  [A] 'function int dquot_quota_off(super_block*, int)'
  [A] 'function int dquot_quota_on(super_block*, int, int, const path*)'
  [A] 'function int dquot_quota_on_mount(super_block*, char*, int, int)'
  [A] 'function int dquot_release(dquot*)'
  [A] 'function int dquot_resume(super_block*, int)'
  [A] 'function int dquot_set_dqblk(super_block*, kqid, qc_dqblk*)'
  [A] 'function int dquot_set_dqinfo(super_block*, int, qc_info*)'
  [A] 'function int dquot_transfer(inode*, iattr*)'
  [A] 'function int dquot_writeback_dquots(super_block*, int)'
  [A] 'function void evict_inodes(super_block*)'
  [A] 'function inode* find_inode_nowait(super_block*, unsigned long int, int (inode*, unsigned long int, void*)*, void*)'
  [A] 'function int freeze_bdev(block_device*)'
  [A] 'function int freeze_super(super_block*)'
  [A] 'function void fscrypt_decrypt_bio(bio*)'
  [A] 'function bool fscrypt_dio_supported(kiocb*, iov_iter*)'
  [A] 'function int fscrypt_drop_inode(inode*)'
  [A] 'function page* fscrypt_encrypt_pagecache_blocks(page*, unsigned int, unsigned int, gfp_t)'
  [A] 'function int fscrypt_file_open(inode*, file*)'
  [A] 'function int fscrypt_fname_alloc_buffer(u32, fscrypt_str*)'
  [A] 'function int fscrypt_fname_disk_to_usr(const inode*, u32, u32, const fscrypt_str*, fscrypt_str*)'
  [A] 'function void fscrypt_fname_free_buffer(fscrypt_str*)'
  [A] 'function u64 fscrypt_fname_siphash(const inode*, const qstr*)'
  [A] 'function void fscrypt_free_bounce_page(page*)'
  [A] 'function void fscrypt_free_inode(inode*)'
  [A] 'function const char* fscrypt_get_symlink(inode*, void*, unsigned int, delayed_call*)'
  [A] 'function int fscrypt_has_permitted_context(inode*, inode*)'
  [A] 'function int fscrypt_ioctl_add_key(file*, void*)'
  [A] 'function int fscrypt_ioctl_get_key_status(file*, void*)'
  [A] 'function int fscrypt_ioctl_get_nonce(file*, void*)'
  [A] 'function int fscrypt_ioctl_get_policy(file*, void*)'
  [A] 'function int fscrypt_ioctl_get_policy_ex(file*, void*)'
  [A] 'function int fscrypt_ioctl_remove_key(file*, void*)'
  [A] 'function int fscrypt_ioctl_remove_key_all_users(file*, void*)'
  [A] 'function int fscrypt_ioctl_set_policy(file*, void*)'
  [A] 'function bool fscrypt_match_name(const fscrypt_name*, const u8*, u32)'
  [A] 'function bool fscrypt_mergeable_bio(bio*, const inode*, long long unsigned int)'
  [A] 'function int fscrypt_prepare_new_inode(inode*, inode*, bool*)'
  [A] 'function int fscrypt_prepare_symlink(inode*, const char*, unsigned int, unsigned int, fscrypt_str*)'
  [A] 'function void fscrypt_put_encryption_info(inode*)'
  [A] 'function void fscrypt_set_bio_crypt_ctx(bio*, const inode*, long long unsigned int, unsigned int)'
  [A] 'function int fscrypt_set_context(inode*, void*)'
  [A] 'function int fscrypt_set_test_dummy_encryption(super_block*, const char*, fscrypt_dummy_policy*)'
  [A] 'function int fscrypt_setup_filename(inode*, const qstr*, int, fscrypt_name*)'
  [A] 'function void fscrypt_show_test_dummy_encryption(seq_file*, char, super_block*)'
  [A] 'function int fscrypt_symlink_getattr(const path*, kstat*)'
  [A] 'function int fscrypt_zeroout_range(const inode*, unsigned long int, sector_t, unsigned int)'
  [A] 'function void fsverity_cleanup_inode(inode*)'
  [A] 'function void fsverity_enqueue_verify_work(work_struct*)'
  [A] 'function int fsverity_file_open(inode*, file*)'
  [A] 'function int fsverity_ioctl_enable(file*, void*)'
  [A] 'function int fsverity_ioctl_measure(file*, void*)'
  [A] 'function int fsverity_ioctl_read_metadata(file*, void*)'
  [A] 'function int fsverity_prepare_setattr(dentry*, iattr*)'
  [A] 'function void fsverity_verify_bio(bio*)'
  [A] 'function bool fsverity_verify_page(page*)'
  [A] 'function dentry* generic_fh_to_dentry(super_block*, fid*, int, int, inode* (super_block*, typedef u64, typedef u32)*)'
  [A] 'function dentry* generic_fh_to_parent(super_block*, fid*, int, int, inode* (super_block*, typedef u64, typedef u32)*)'
  [A] 'function long long int generic_file_llseek_size(file*, long long int, int, long long int, long long int)'
  [A] 'function void generic_set_encrypted_ci_d_ops(dentry*)'
  [A] 'function void iget_failed(inode*)'
  [A] 'function inode* iget_locked(super_block*, unsigned long int)'
  [A] 'function inode* ilookup(super_block*, unsigned long int)'
  [A] 'function void inode_nohighmem(inode*)'
  [A] 'function int insert_inode_locked(inode*)'
  [A] 'function posix_acl* posix_acl_alloc(int, gfp_t)'
  [A] 'function int posix_acl_chmod(inode*, unsigned short int)'
  [A] 'function int posix_acl_equiv_mode(const posix_acl*, umode_t*)'
  [A] 'function void seq_escape(seq_file*, const char*, const char*)'
  [A] 'function void set_cached_acl(inode*, int, posix_acl*)'
  [A] 'function int set_task_ioprio(task_struct*, int)'
  [A] 'function void shrink_dcache_sb(super_block*)'
  [A] 'function void sync_inodes_sb(super_block*)'
  [A] 'function int thaw_bdev(block_device*)'
  [A] 'function int thaw_super(super_block*)'
  [A] 'function int vfs_ioc_fssetxattr_check(inode*, const fsxattr*, fsxattr*)'
  [A] 'function int vfs_ioc_setflags_prepare(inode*, unsigned int, unsigned int)'
  [A] 'function long long int vfs_setpos(file*, long long int, long long int)'
  [A] 'function void wbc_account_cgroup_owner(writeback_control*, page*, size_t)'

Bug: 220957464
Signed-off-by: wuzhe <wuzhe@oppo.com>
Change-Id: I85c3d05d169e20a451ba9d808c785c4965e026d1
2022-04-20 16:56:09 +00:00
Lu Baolu
b8bb3b43a4 BACKPORT: iommu: Extend mutex lock scope in iommu_probe_device()
Extend the scope of holding group->mutex so that it can cover the default
domain check/attachment and direct mappings of reserved regions.

Cc: Ashish Mhetre <amhetre@nvidia.com>
Fixes: 211ff31b3d ("iommu: Fix race condition during default domain allocation")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211108061349.1985579-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

(cherry picked from commit 556f99ac88)

Bug: 229173748
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I59bdee605a029bb28e7b023b90ab85983619be97
2022-04-19 19:55:09 -07:00
Ashish Mhetre
00f2b55cc4 BACKPORT: iommu: Fix race condition during default domain allocation
When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_domain sometimes is getting allocated more
than once as call to iommu_alloc_default_domain() is not protected for
concurrency. Furthermore, it leads to each device holding a different
iommu_domain pointer, separate IOVA space and only one of the devices'
domain is used for translations from IOMMU. This causes accesses from other
device to fault or see incorrect translations.
Fix this by protecting iommu_alloc_default_domain() call with group->mutex
and let all devices with same SID share same iommu_domain.

Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Link: https://lore.kernel.org/r/1628570641-9127-2-git-send-email-amhetre@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>

Bug: 229173748
(cherry picked from commit 211ff31b3d)
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iafd361871c0f06f75f71f7dbd7575b74d9c4253f
2022-04-19 19:54:57 -07:00
Wei Liu
0dcfc2c036 ANDROID: GKI: Update symbols to symbol list
Update symbols to symbol list from oppo

Leaf changes summary: 3 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 2 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable

2 Added functions:

  [A] 'function sock* __inet6_lookup_established(net*, inet_hashinfo*, const in6_addr*, const __be16, const in6_addr*, const u16, const int, const int)'
  [A] 'function sock* __inet_lookup_established(net*, inet_hashinfo*, const __be32, const __be16, const __be32, const u16, const int, const int)'

1 Added variable:

  [A] 'inet_hashinfo tcp_hashinfo'

Bug: 193384408

Signed-off-by: Wei Liu <liuwei.a@oppo.com>
Change-Id: I807fb047fdfb08d17fe19f26f753bd118c1f19ee
2022-04-18 17:13:55 +08:00
Mukesh Ojha
9608dc38a0 FROMLIST: remoteproc: Use unbounded workqueue for recovery work
There could be a scenario where there is too much load on a core
(n number of tasks which is affined) or in a case when multiple
rproc subsystem is going for a recovery and they queued recovery
work to one core so even though subsystem are independent there
recovery will be delayed if one of the subsystem recovery work
is taking more time in completing.

If we make this queue unbounded, the recovery work could be picked
on any cpu. This patch try to address this.

Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Bug: 228429683
Change-Id: If18b39db6c0861989a6a3b36d9efde5f488b9b73
Link: https://lore.kernel.org/lkml/1649313998-1086-1-git-send-email-quic_mojha@quicinc.com/
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
2022-04-14 08:01:30 +00:00
Lina Wang
b5bcf0d667 UPSTREAM: xfrm: fix tunnel model fragmentation behavior
in tunnel mode, if outer interface(ipv4) is less, it is easily to let
inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
is received. When send again, packets are fragmentized with 1280, they
are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().

According to RFC4213 Section3.2.2:
if (IPv4 path MTU - 20) is less than 1280
	if packet is larger than 1280 bytes
		Send ICMPv6 "packet too big" with MTU=1280
                Drop packet
        else
		Encapsulate but do not set the Don't Fragment
                flag in the IPv4 header.  The resulting IPv4
                packet might be fragmented by the IPv4 layer
                on the encapsulator or by some router along
                the IPv4 path.
	endif
else
	if packet is larger than (IPv4 path MTU - 20)
        	Send ICMPv6 "packet too big" with
                MTU = (IPv4 path MTU - 20).
                Drop packet.
        else
                Encapsulate and set the Don't Fragment flag
                in the IPv4 header.
        endif
endif
Packets should be fragmentized with ipv4 outer interface, so change it.

After it is fragemtized with ipv4, there will be double fragmenation.
No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
cannot decrypt it rightly.

48              2002::10        2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
49   0x0000 (0) 2002::10        2002::11 1304         IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
50   0x0000 (0) 2002::10        2002::11 200          ESP (SPI=0x00035000)
51              2002::10        2002::11 180          Echo (ping) request
52   0x56dc     2002::10        2002::11 248          IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)

xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
1   0x6206 192.168.1.138   192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
2   0x6206 2002::10        2002::11    88   IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
3   0x0000 2002::10        2002::11    248  ICMPv6    Echo (ping) request

Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Bug: 226699354
Change-Id: Ideec82bea6a1efa26352680cb3113f7c36b945ef
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
2022-04-13 05:42:12 +00:00
Subash Abhinov Kasiviswanathan
20e11d7969 ANDROID: GKI: Enable CRYPTO_DES
This algorithm is still used in wifi calling scenarios.

Bug: 228528489
Change-Id: I06041ec023d023cfee175bd02b6db8e6c9656518
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
2022-04-08 15:29:18 +00:00
Greg Kroah-Hartman
eff1ffbf0c ANDROID: GKI: set more vfs-only exports into their own namespace
There are more vfs-only symbols that OEMs want to use, so place them in
the proper vfs-only namespace.

Bug: 157965270
Bug: 210074446
Bug: 227656251
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I99b9facc8da45fb329f6627d204180d1f89bcf97
2022-04-07 20:52:29 +02:00
Jaskaran Singh
3c06a5ce5e ANDROID: Split ANDROID_STRUCT_PADDING into separate configs
Not all non-GKI platforms support disabling ANDROID_STRUCT_PADDING,
as some modules may require Android vendor data. However, it would be
beneficial to have the option to disable some of the struct paddings,
such as ANDROID_KABI_RESERVE, for memory savings given a situation
where the ANDROID_STRUCT_PADDING config cannot be disabled.

Split the ANDROID_STRUCT_PADDING config into two configs, one to control
ANDROID_VENDOR_DATA and ANDROID_OEM_DATA, and another to control
ANDROID_KABI_RESERVE.

Bug: 206561931
Change-Id: Iea4b962dff386a17c9bef20ae048be4e17bf43ab
Signed-off-by: Jaskaran Singh <quic_jasksing@quicinc.com>
2022-04-07 05:44:50 +00:00
Tadeusz Struk
5b1bb43708 ANDROID: selftests: incfs: skip large_file_test test is not enough free space
Make the large_file_test check if there is at least 3GB of free disk
space and skip the test if there is not. This is to make the tests pass
on a VM with limited disk size, now all functional tests are passing.

TAP version 13
1..26
ok 1 basic_file_ops_test
ok 2 cant_touch_index_test
ok 3 dynamic_files_and_data_test
ok 4 concurrent_reads_and_writes_test
ok 5 attribute_test
ok 6 work_after_remount_test
ok 7 child_procs_waiting_for_data_test
ok 8 multiple_providers_test
ok 9 hash_tree_test
ok 10 read_log_test
ok 11 get_blocks_test
ok 12 get_hash_blocks_test
ok 13 large_file_test
ok 14 mapped_file_test
ok 15 compatibility_test
ok 16 data_block_count_test
ok 17 hash_block_count_test
ok 18 per_uid_read_timeouts_test
ok 19 inotify_test
ok 20 verity_test
ok 21 enable_verity_test
ok 22 mmap_test
ok 23 truncate_test
ok 24 stat_test
ok 25 sysfs_test
Error mounting fs.: File exists
Error mounting fs.: File exists
ok 26 sysfs_rename_test

Bug: 211066171

Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I2260e2b314429251070d0163c70173f237f86476
2022-04-06 17:26:26 +00:00
Tadeusz Struk
3b25a439ce ANDROID: selftests: incfs: Add -fno-omit-frame-pointer
Without it incfs/incfs_perf runtime fails in format_signature:

malloc(): invalid size (unsorted)
Aborted

When compiled with gcc version 11.2.0.

Also add check for NULL after the malloc, and remove unneeded
space for uint32_t in signing_section.

Bug: 211066171

Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I62b775140e4b89f75335cbd65665cf6a3e0fe964
2022-04-06 17:25:34 +00:00
Tadeusz Struk
3e45af8a72 ANDROID: incremental-fs: limit mount stack depth
Syzbot recently found a number of issues related to incremental-fs
(see bug numbers below). All have to do with the fact that incr-fs
allows mounts of the same source and target multiple times.
This is a design decision and the user space component "Data Loader"
expects this to work for app re-install use case.
The mounting depth needs to be controlled, however, and only allowed
to be two levels deep. In case of more than two mount attempts the
driver needs to return an error.
In case of the issues listed below the common pattern is that the
reproducer calls:

mount("./file0", "./file0", "incremental-fs", 0, NULL)

many times and then invokes a file operation like chmod, setxattr,
or open on the ./file0. This causes a recursive call for all the
mounted instances, which eventually causes a stack overflow and
a kernel crash:

BUG: stack guard page was hit at ffffc90000c0fff8
kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP KASAN

This change also cleans up the mount error path to properly clean
allocated resources and call deactivate_locked_super(), which
causes the incfs_kill_sb() to be called, where the sb is freed.

Bug: 211066171
Bug: 213140206
Bug: 213215835
Bug: 211914587
Bug: 211213635
Bug: 213137376
Bug: 211161296

Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I08d9b545a2715423296bf4beb67bdbbed78d1be1
2022-04-06 17:24:59 +00:00
wuzhe
d8fade2b40 ANDROID: GKI: Update symbols to abi_gki_aarch64_oplus
Leaf changes summary: 42 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 42 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

42 Added functions:

  [A] 'function int __cleancache_get_page(page*)'
  [A] 'function unsigned long int __page_file_index(page*)'
  [A] 'function address_space* __page_file_mapping(page*)'
  [A] 'function int __percpu_counter_init(percpu_counter*, s64, gfp_t, lock_class_key*)'
  [A] 'function s64 __percpu_counter_sum(percpu_counter*)'
  [A] 'function void __xa_clear_mark(xarray*, unsigned long int, xa_mark_t)'
  [A] 'function int add_swap_extent(swap_info_struct*, unsigned long int, unsigned long int, long long unsigned int)'
  [A] 'function int add_to_page_cache_lru(page*, address_space*, unsigned long int, gfp_t)'
  [A] 'function long int congestion_wait(int, long int)'
  [A] 'function bool filemap_allow_speculation()'
  [A] 'function int filemap_check_errors(address_space*)'
  [A] 'function vm_fault_t filemap_map_pages(vm_fault*, unsigned long int, unsigned long int)'
  [A] 'function void generate_random_uuid(unsigned char*)'
  [A] 'function int kset_register(kset*)'
  [A] 'function char* match_strdup(const __anonymous_struct__*)'
  [A] 'function void migrate_page_copy(page*, page*)'
  [A] 'function int migrate_page_move_mapping(address_space*, page*, page*, int)'
  [A] 'function void migrate_page_states(page*, page*)'
  [A] 'function void page_cache_ra_unbounded(readahead_control*, unsigned long int, unsigned long int)'
  [A] 'function void page_cache_sync_ra(readahead_control*, file_ra_state*, unsigned long int)'
  [A] 'function const char* page_get_link(dentry*, inode*, delayed_call*)'
  [A] 'function int page_symlink(inode*, const char*, int)'
  [A] 'function int pagecache_write_begin(file*, address_space*, loff_t, unsigned int, unsigned int, page**, void**)'
  [A] 'function int pagecache_write_end(file*, address_space*, loff_t, unsigned int, unsigned int, page*, void*)'
  [A] 'function void percpu_counter_add_batch(percpu_counter*, long long int, int)'
  [A] 'function void percpu_counter_destroy(percpu_counter*)'
  [A] 'function void percpu_counter_set(percpu_counter*, long long int)'
  [A] 'function unsigned int radix_tree_gang_lookup(const xarray*, void**, unsigned long int, unsigned int)'
  [A] 'function int radix_tree_preload(unsigned int)'
  [A] 'function page* read_cache_page_gfp(address_space*, unsigned long int, gfp_t)'
  [A] 'function void truncate_inode_pages_range(address_space*, loff_t, loff_t)'
  [A] 'function void truncate_pagecache_range(inode*, loff_t, loff_t)'
  [A] 'function int utf8_casefold(const unicode_map*, const qstr*, unsigned char*, unsigned long int)'
  [A] 'function unicode_map* utf8_load(const char*)'
  [A] 'function int utf8_strncasecmp_folded(const unicode_map*, const qstr*, const qstr*)'
  [A] 'function void utf8_unload(unicode_map*)'
  [A] 'function int utf8s_to_utf16s(const unsigned char*, int, utf16_endian, unsigned short int*, int)'
  [A] 'function void vm_unmap_aliases()'
  [A] 'function void wait_for_completion_io(completion*)'
  [A] 'function void wait_for_stable_page(page*)'
  [A] 'function void wait_on_page_writeback(page*)'
  [A] 'function bool xa_get_mark(xarray*, unsigned long int, unsigned int)'

Bug: 227860734
Signed-off-by: wuzhe <wuzhe@oppo.com>
Change-Id: Ied8975ea5ca3026f2177249604a6721b8739a6df
2022-04-06 15:06:03 +08:00
Saravana Kannan
ceb6918d1d ANDROID: vendor_hooks: Reduce pointless modversions CRC churn
When vendor hooks are added to a file that previously didn't have any
vendor hooks, we end up indirectly including linux/tracepoint.h.  This
causes some data types that used to be opaque (forward declared) to the
code to become visible to the code.

Modversions correctly catches this change in visibility, but we don't
really care about the data types made visible when linux/tracepoint.h is
included. So, hide this from modversions in the central vendor_hooks.h
file instead of having to fix this on a case by case basis.

Since this is a KMI frozen branch, existing vendor hook headers are left
as is to avoid KMI breakage due to CRC churn.

To avoid future pointless CRC churn, new vendor hook header files that
include vendor_hooks.h should not include linux/tracepoint.h directly.

Bug: 227513263
Bug: 226140073
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: Ia88e6af11dd94fe475c464eb30a6e5e1e24c938b
2022-04-01 18:13:28 +00:00
Waiman Long
002528dfb5 UPSTREAM: locking/lockdep: Avoid potential access of invalid memory in lock_class
commit 61cc4534b6 upstream.

It was found that reading /proc/lockdep after a lockdep splat may
potentially cause an access to freed memory if lockdep_unregister_key()
is called after the splat but before access to /proc/lockdep [1]. This
is due to the fact that graph_lock() call in lockdep_unregister_key()
fails after the clearing of debug_locks by the splat process.

After lockdep_unregister_key() is called, the lock_name may be freed
but the corresponding lock_class structure still have a reference to
it. That invalid memory pointer will then be accessed when /proc/lockdep
is read by a user and a use-after-free (UAF) error will be reported if
KASAN is enabled.

To fix this problem, lockdep_unregister_key() is now modified to always
search for a matching key irrespective of the debug_locks state and
zap the corresponding lock class if a matching one is found.

[1] https://lore.kernel.org/lkml/77f05c15-81b6-bddd-9650-80d5f23fe330@i-love.sakura.ne.jp/

Bug: 225086211
Fixes: 8b39adbee8 ("locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.kernel.org/r/20220103023558.1377055-1-longman@redhat.com
Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com>
Change-Id: I1b03e363142d2b9905f8f263b02d3c7cfdbc515a
2022-04-01 17:14:44 +00:00
Suren Baghdasaryan
404df4751a ANDROID: mm: Fix implicit declaration of function 'isolate_lru_page'
When compiled with CONFIG_SHMEM=n, shmem.c does not include internal.h
and isolate_lru_page function declaration can't be found.
Fix this by making isolate_lru_page usage conditional upon CONFIG_SHMEM
inside reclaim_shmem_address_space.

Fixes: daeabfe7fa ("ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia46a57681d26ac103e84ef7caa61a22dbd45cf04
2022-03-31 19:58:24 +00:00
Peifeng Li
e2c0e8502e ANDROID: GKI: Update symbols to symbol list
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable

1 Added function:

  [A] 'function int __traceiter_android_vh_pcplist_add_cma_pages_bypass(void*, int, bool*)'

1 Added variable:

  [A] 'tracepoint __tracepoint_android_vh_pcplist_add_cma_pages_bypass'

Bug: 193384408

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ic8eeb43ed58120379c7fc16642545d70d220384a
2022-03-31 19:47:56 +00:00
Peifeng Li
51513a1775 ANDROID: GKI: Update symbols to symbol list
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable

1 Added function:

  [A] 'function int __traceiter_android_vh_cma_drain_all_pages_bypass(void*, unsigned int, bool*)'

1 Added variable:

  [A] 'tracepoint __tracepoint_android_vh_cma_drain_all_pages_bypass'

Bug: 193384408

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ib9f95f702e2776a6caded709a88f1e9f2ee154b1
2022-03-31 19:47:47 +00:00
Liujie Xie
7b7125914c ANDROID: GKI: Add hook symbol to symbol list
Add shrink_node_memcgs() tracepoint symbols to symbol list.

Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable

1 Added function:

  [A] 'function int __traceiter_android_vh_shrink_node_memcgs(void*, mem_cgroup*, bool*)'

1 Added variable:

  [A] 'tracepoint __tracepoint_android_vh_shrink_node_memcgs'

Bug: 226482420
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Iff8a2b95a4e891bce814e9a70520368e28b4b5f0
2022-03-31 17:29:11 +00:00
Lee Jones
7a7eadac58 Revert "ANDROID: dm-bow: Protect Ranges fetched and erased from the RB tree"
This reverts commit f3ca80cced.

Reason for revert: Needs rework - causes unforeseen deadlock.

Bug: 227141277
Change-Id: I46f2b3d34c6a6a0a96d0fcc10086aa0d1687b127
Signed-off-by: Lee Jones <joneslee@google.com>
2022-03-31 11:20:55 +00:00
Peifeng Li
cb7c1a4c78 ANDROID: vendor_hooks: Add hooks to for free_unref_page_commit
Provide a vendor hook to skip cma-pages to add in pcplist when
free_unref_page_commit.

The patch is revelant to skip drain_all_pages in alloc_contig_range,
the revelant hooks is android_vh_cma_drain_all_pages_bypass
which is to avoid to delay in drain pcppages when drain_all_pages.

In most case, pcp->high is small so that free-pages with other mt_types
can also fill with pcplist full.

Bug: 224732340
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ifdeeed9f8934d87671ec3fa6787a02675b993082
2022-03-30 15:56:32 +00:00
Peifeng Li
a2485b8abd ANDROID: vendor_hooks: Add hooks to for alloc_contig_range
Provide a vendor hook to allow drain_all_pages to be skipped
during alloc_contig_range in some cases to avoid delays caused by
it in cases when the benefits of draining pcp lists are known
to be small.

Bug: 224732340
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0a82f668cf985ad5344d666c0c6372a7e61c3798
2022-03-30 15:56:25 +00:00
Peifeng Li
bc159fee3d ANDROID: GKI: Update symbols to symbol list
Update symbols to symbol list externed by oem modules.

Leaf changes summary: 1 artifact changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

1 Added function:

[A] 'function unsigned long int shrink_slab(gfp_t, int, mem_cgroup*, int)'

Bug: 193384408

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I60dc617ba6c33c5ef52c89488acb8cc62ed22a61
2022-03-30 02:55:32 +00:00
Liujie Xie
95380146ce ANDROID: vendor_hooks: Add hook in shrink_node_memcgs
Add vendor hook in shrink_node_memcgs to adjust whether
to skip memory reclamation of memcg.

Bug: 226482420
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I925856353e63c5a821027de4f8476c833e21b982
2022-03-29 19:43:45 +00:00
Liujie Xie
a3f112353c ANDROID: GKI: Add symbols to symbol list
Add walk_page_range() and swp_swap_info() to symbol list.

Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 2 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

2 Added functions:

  [A] 'function swap_info_struct* swp_swap_info(swp_entry_t)'
  [A] 'function int walk_page_range(mm_struct*, unsigned long int, unsigned long int, const mm_walk_ops*, void*)'

Bug: 225273514
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ia3e8ba0d0a8fe4ed3953af9baf9028d5f27e76e2
2022-03-29 19:32:03 +00:00
Yunfei Wang
ec48b1892e FROMGIT: iommu/iova: Improve 32-bit free space estimate
For various reasons based on the allocator behaviour and typical
use-cases at the time, when the max32_alloc_size optimisation was
introduced it seemed reasonable to couple the reset of the tracked
size to the update of cached32_node upon freeing a relevant IOVA.
However, since subsequent optimisations focused on helping genuine
32-bit devices make best use of even more limited address spaces, it
is now a lot more likely for cached32_node to be anywhere in a "full"
32-bit address space, and as such more likely for space to become
available from IOVAs below that node being freed.

At this point, the short-cut in __cached_rbnode_delete_update() really
doesn't hold up any more, and we need to fix the logic to reliably
provide the expected behaviour. We still want cached32_node to only move
upwards, but we should reset the allocation size if *any* 32-bit space
has become available.

Reported-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

Bug: 223712131
(cherry picked from commit 5b61343b50
 https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Change-Id: I5026411dd022c6ddea5c0e4da6e69c7b14162c3f
2022-03-29 17:45:17 +00:00
Liujie Xie
d9845e9e5c ANDROID: export walk_page_range and swp_swap_info
Export walk_page_range and swp_swap_info for reading swap
from backing device to zram.

Bug: 225273514
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: If888cfc2823d8003b62bdb177740643696cf6f7e
2022-03-29 17:40:45 +00:00
Peifeng Li
71d560e017 ANDROID: vendor_hooks: export shrink_slab
Export shrink_slab to module for do shrink-memory action.

Bug: 221768451
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I5abe9ad419d64999b714d879c228625a243e90d1
2022-03-28 21:26:10 +00:00
Aran Dalton
74720dae8b ANDROID: usb: gadget: f_accessory: add compat_ioctl support
On Android 32-bit system, the following Cts Verifier testcase failed:

manualTests#com.android.cts.verifier.usb.accessory.UsbAccessoryTestActivity

The reason is that compat_ioctl() needs to be called.
So let's add compat_ioctl() for 32-bit applications to solve this issue.

Bug: 223101878
Change-Id: I6e1f797d919494d293184411041955c33ad08aef
Signed-off-by: Aran Dalton <arda@allwinnertech.com>
(cherry picked from commit 77bf53b486)
2022-03-25 08:16:04 +00:00
Oliver Neukum
e7f39d0aa2 UPSTREAM: sr9700: sanity check for packet length
[ Upstream commit e9da0b56fe ]

A malicious device can leak heap data to user space
providing bogus frame lengths. Introduce a sanity check.

Bug: 225469258
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I2f17bd57b5c4228d0420f716527316a878a34efd
2022-03-24 10:49:22 +00:00