linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-06 19:08:57 +09:00

Author	SHA1	Message	Date
SeongJae Park	e24d4d7d21	UPSTREAM: mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()' Kunit test cases for 'damon_split_regions_of()' expects the number of regions after calling the function will be same to their request ('nr_sub'). However, the requested number is just an upper-limit, because the function randomly decides the size of each sub-region. This fixes the wrong expectation. Link: https://lkml.kernel.org/r/20211028090628.14948-1-sj@kernel.org Fixes: `17ccae8bb5` ("mm/damon: add kunit tests") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `2e014660b3`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I3237d82880ca4308dd4161de9dd957925384a333	2022-04-22 15:19:31 -07:00
Adam Borowski	729698e1ab	UPSTREAM: mm/damon: don't use strnlen() with known-bogus source length gcc knows the true length too, and rightfully complains. Link: https://lkml.kernel.org/r/20210912204447.10427-1-kilobyte@angband.pl Signed-off-by: Adam Borowski <kilobyte@angband.pl> Cc: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `892ab4bbd0`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: Ia3ad91fd9541a2e57156a999c5fe3e70efe740fb	2022-04-22 15:19:30 -07:00
SeongJae Park	789928c5b6	UPSTREAM: mm/damon: add kunit tests This commit adds kunit based unit tests for the core and the virtual address spaces monitoring primitives of DAMON. Link: https://lkml.kernel.org/r/20210716081449.22187-12-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `17ccae8bb5`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I3f660b6064a2db350b6b2a422889d189c7a72f62	2022-04-22 15:19:30 -07:00
SeongJae Park	d3cff19d31	UPSTREAM: mm/damon: add user space selftests This commit adds a simple user space tests for DAMON. The tests are using kselftest framework. Link: https://lkml.kernel.org/r/20210716081449.22187-13-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Markus Boehme <markubo@amazon.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `b348eb7abd`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I4406106b9d7a0b3774de638fc014357762c30f8e	2022-04-22 15:19:30 -07:00
SeongJae Park	ac418a7965	UPSTREAM: mm/damon/dbgfs: support multiple contexts In some use cases, users would want to run multiple monitoring context. For example, if a user wants a high precision monitoring and dedicating multiple CPUs for the job is ok, because DAMON creates one monitoring thread per one context, the user can split the monitoring target regions into multiple small regions and create one context for each region. Or, someone might want to simultaneously monitor different address spaces, e.g., both virtual address space and physical address space. The DAMON's API allows such usage, but 'damon-dbgfs' does not. Therefore, only kernel space DAMON users can do multiple contexts monitoring. This commit allows the user space DAMON users to use multiple contexts monitoring by introducing two new 'damon-dbgfs' debugfs files, 'mk_context' and 'rm_context'. Users can create a new monitoring context by writing the desired name of the new context to 'mk_context'. Then, a new directory with the name and having the files for setting of the context ('attrs', 'target_ids' and 'record') will be created under the debugfs directory. Writing the name of the context to remove to 'rm_context' will remove the related context and directory. Link: https://lkml.kernel.org/r/20210716081449.22187-10-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `75c1c2b53c`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: Idba5509b6ef147cd6f8cd820d241718b284a34a3	2022-04-22 15:19:30 -07:00
SeongJae Park	9fda42d2d6	UPSTREAM: mm/damon/dbgfs: export kdamond pid to the user space For CPU usage accounting, knowing pid of the monitoring thread could be helpful. For example, users could use cpuaccount cgroups with the pid. This commit therefore exports the pid of currently running monitoring thread to the user space via 'kdamond_pid' file in the debugfs directory. Link: https://lkml.kernel.org/r/20210716081449.22187-9-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `429538e854`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I32d41776a1850ed189b2759194061004ca8a83cf	2022-04-22 15:19:30 -07:00
SeongJae Park	c8ecb4f7a1	UPSTREAM: mm/damon: implement a debugfs-based user space interface DAMON is designed to be used by kernel space code such as the memory management subsystems, and therefore it provides only kernel space API. That said, letting the user space control DAMON could provide some benefits to them. For example, it will allow user space to analyze their specific workloads and make their own special optimizations. For such cases, this commit implements a simple DAMON application kernel module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports those to the user space via the debugfs. 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``. Attributes ---------- Users can read and write the ``sampling interval``, ``aggregation interval``, ``regions update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again:: # cd <debugfs>/damon # echo 5000 100000 1000000 10 1000 > attrs # cat attrs 5000 100000 1000000 10 1000 Target IDs ---------- Some types of address spaces supports multiple monitoring target. For example, the virtual memory address spaces monitoring can have multiple processes as the monitoring targets. Users can set the targets by writing relevant id values of the targets to, and get the ids of the current targets by reading from the ``target_ids`` file. In case of the virtual address spaces monitoring, the values should be pids of the monitoring target processes. For example, below commands set processes having pids 42 and 4242 as the monitoring targets and check it again:: # cd <debugfs>/damon # echo 42 4242 > target_ids # cat target_ids 42 4242 Note that setting the target ids doesn't start the monitoring. Turning On/Off -------------- Setting the files as described above doesn't incur effect unless you explicitly start the monitoring. You can start, stop, and check the current status of the monitoring by writing to and reading from the ``monitor_on`` file. Writing ``on`` to the file starts the monitoring of the targets with the attributes. Writing ``off`` to the file stops those. DAMON also stops if every targets are invalidated (in case of the virtual memory monitoring, target processes are invalidated when terminated). Below example commands turn on, off, and check the status of DAMON:: # cd <debugfs>/damon # echo on > monitor_on # echo off > monitor_on # cat monitor_on off Please note that you cannot write to the above-mentioned debugfs files while the monitoring is turned on. If you write to the files while DAMON is running, an error code such as ``-EBUSY`` will be returned. [akpm@linux-foundation.org: remove unneeded "alloc failed" printks] [akpm@linux-foundation.org: replace macro with static inline] Link: https://lkml.kernel.org/r/20210716081449.22187-8-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `4bc05954d0`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I99664d4372dc5c510d7e16ffe384345e52579b0e	2022-04-22 15:19:30 -07:00
SeongJae Park	e415cf98cb	UPSTREAM: mm/damon: add a tracepoint This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Link: https://lkml.kernel.org/r/20210716081449.22187-7-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `2fcb93629a`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I502a57938f7aa60c55b05525b45e50ddb1aeda65	2022-04-22 15:19:30 -07:00
SeongJae Park	75f4f6ebe9	UPSTREAM: mm/damon: implement primitives for the virtual memory address spaces This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space does the works as below. PTE Accessed-bit Based Access Check ----------------------------------- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for the next sampling target page and checks whether it is set again after one sampling period. This could disturb the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict, as Idle page tracking does. VMA-based Target Address Range Construction ------------------------------------------- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spaces, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: <heap> <BIG UNMAPPED REGION 1> <uppermost mmap()-ed region> (small mmap()-ed regions and munmap()-ed regions) <lowermost mmap()-ed region> <BIG UNMAPPED REGION 2> <stack> [akpm@linux-foundation.org: mm/damon/vaddr.c needs highmem.h for kunmap_atomic()] [sjpark@amazon.de: remove unnecessary PAGE_EXTENSION setup] Link: https://lkml.kernel.org/r/20210806095153.6444-2-sj38.park@gmail.com [sjpark@amazon.de: safely walk page table] Link: https://lkml.kernel.org/r/20210831161800.29419-1-sj38.park@gmail.com Link: https://lkml.kernel.org/r/20210716081449.22187-6-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `3f49584b26`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: Ib7d134932b1dd20cd926968991654d71717c197d	2022-04-22 15:19:29 -07:00
SeongJae Park	ad6156f833	UPSTREAM: mm/idle_page_tracking: make PG_idle reusable PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while not interfering with each other. That is, when they need to clear the Accessed bit, they set PG_young to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young to know whether the other has cleared the bit meanwhile or not. For yet another user of the PTE Accessed bit, we could add another page flag, or extend the mechanism to use the flags. For the DAMON usecase, however, we don't need to do that just yet. IDLE_PAGE_TRACKING and DAMON are mutually exclusive, so there's only ever going to be one user of the current set of flags. In this commit, we split out the CONFIG options to allow for the use of PG_young and PG_idle outside of idle page tracking. In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it. [sjpark@amazon.de: set PAGE_EXTENSION for non-64BIT] Link: https://lkml.kernel.org/r/20210806095153.6444-1-sj38.park@gmail.com [akpm@linux-foundation.org: tweak Kconfig text] [sjpark@amazon.de: hide PAGE_IDLE_FLAG from users] Link: https://lkml.kernel.org/r/20210813081238.34705-1-sj38.park@gmail.com Link: https://lkml.kernel.org/r/20210716081449.22187-5-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `1c676e0d9b`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I4b2c3087b069b0306ab8743402fabcd195d63e54	2022-04-22 15:19:29 -07:00
SeongJae Park	f78eee74b4	UPSTREAM: mm/damon: adaptively adjust regions Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), the data access pattern can be dynamically changed. This will result in low monitoring quality. To keep the assumption as much as possible, DAMON adaptively merges and splits each region based on their access frequency. For each ``aggregation interval``, it compares the access frequencies of adjacent regions and merges those if the frequency difference is small. Then, after it reports and clears the aggregated access frequency of each region, it splits each region into two or three regions if the total number of regions will not exceed the user-specified maximum number of regions after the split. In this way, DAMON provides its best-effort quality and minimal overhead while keeping the upper-bound overhead that users set. Link: https://lkml.kernel.org/r/20210716081449.22187-4-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `b9a6ac4e4e`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I330416cf16f4e72fd8f26e01d3bd425738216c70	2022-04-22 15:19:29 -07:00
SeongJae Park	40064a1877	UPSTREAM: mm/damon/core: implement region-based sampling To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that are assumed to have the same access frequencies into a region. As long as the assumption (pages in a region have the same access frequencies) is kept, only one page in the region is required to be checked. Thus, for each ``sampling interval``, 1. the 'prepare_access_checks' primitive picks one page in each region, 2. waits for one ``sampling interval``, 3. checks whether the page is accessed meanwhile, and 4. increases the access count of the region if so. Therefore, the monitoring overhead is controllable by adjusting the number of regions. DAMON allows both the underlying primitives and user callbacks to adjust regions for the trade-off. In other words, this commit makes DAMON to use not only time-based sampling but also space-based sampling. This scheme, however, cannot preserve the quality of the output if the assumption is not guaranteed. Next commit will address this problem. Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `f23b8eee18`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: Iaecd1ea4d5123e7ffc21ca63de0fe3555bab079d	2022-04-22 15:19:29 -07:00
SeongJae Park	d1e43a5be8	UPSTREAM: mm: introduce Data Access MONitor (DAMON) Patch series "Introduce Data Access MONitor (DAMON)", v34. Introduction ============ DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON called 'region based sampling' and 'adaptive regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for the detail) make it - accurate (The monitored information is useful for DRAM level memory management. It might not appropriate for Cache-level accuracy, though.), - light-weight (The monitoring overhead is low enough to be applied online while making no impact on the performance of the target workloads.), and - scalable (the upper-bound of the instrumentation overhead is controllable regardless of the size of target workloads.). Using this framework, therefore, several memory management mechanisms such as reclamation and THP can be optimized to aware real data access patterns. Experimental access pattern aware memory management optimization works that incurring high instrumentation overhead will be able to have another try. Though DAMON is for kernel subsystems, it can be easily exposed to the user space by writing a DAMON-wrapper kernel subsystem. Then, user space users who have some special workloads will be able to write personalized tools or applications for deeper understanding and specialized optimizations of their systems. DAMON is also merged in two public Amazon Linux kernel trees that based on v5.4.y[1] and v5.10.y[2]. [1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon [2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon The userspace tool[1] is available, released under GPLv2, and actively being maintained. I am also planning to implement another basic user interface in perf[2]. Also, the basic test suite for DAMON is available under GPLv2[3]. [1] https://github.com/awslabs/damo [2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/ [3] https://github.com/awslabs/damon-tests Long-term Plan -------------- DAMON is a part of a project called Data Access-aware Operating System (DAOS). As the name implies, I want to improve the performance and efficiency of systems using fine-grained data access patterns. The optimizations are for both kernel and user spaces. I will therefore modify or create kernel subsystems, export some of those to user space and implement user space library / tools. Below shows the layers and components for the project. --------------------------------------------------------------------------- Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ... Framework: DAMON Features: DAMOS, virtual addr, physical addr, ... Applications: DAMON-debugfs, (DARC), ... ^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ... vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv Library: (libdamon), ... Tools: DAMO, (perf), ... --------------------------------------------------------------------------- The components in parentheses or marked as '...' are not implemented yet but in the future plan. IOW, those are the TODO tasks of DAOS project. For more detail, please refer to the plans: https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/ Evaluations =========== We evaluated DAMON's overhead, monitoring quality and usefulness using 24 realistic workloads on my QEMU/KVM based virtual machine running a kernel that v24 DAMON patchset is applied. DAMON is lightweight. It increases system memory usage by 0.39% and slows target workloads down by 1.16%. DAMON is accurate and useful for memory management optimizations. An experimental DAMON-based operation scheme for THP, namely 'ethp', removes 76.15% of THP memory overheads while preserving 51.25% of THP speedup. Another experimental DAMON-based 'proactive reclamation' implementation, 'prcl', reduces 93.38% of residential sets and 23.63% of system memory footprint while incurring only 1.22% runtime overhead in the best case (parsec3/freqmine). NOTE that the experimental THP optimization and proactive reclamation are not for production but only for proof of concepts. Please refer to the official document[1] or "Documentation/admin-guide/mm: Add a document for DAMON" patch in this patchset for detailed evaluation setup and results. [1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html Real-world User Story ===================== In summary, DAMON has used on production systems and proved its usefulness. DAMON as a profiler ------------------- We analyzed characteristics of a large scale production systems of our customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From this, we were able to find interesting things below. There were obviously different access pattern under idle workload and active workload. Under the idle workload, it accessed large memory regions with low frequency, while the active workload accessed small memory regions with high freuqnecy. DAMON found a 7GB memory region that showing obviously high access frequency under the active workload. We believe this is the performance-effective working set and need to be protected. There was a 4KB memory region that showing highest access frequency under not only active but also idle workloads. We think this must be a hottest code section like thing that should never be paged out. For this analysis, DAMON used only 0.3-1% of single CPU time. Because we used recording-based analysis, it consumed about 3-12 MB of disk space per 20 minutes. This is only small amount of disk space, but we can further reduce the disk usage by using non-recording-based DAMON features. I'd like to argue that only DAMON can do such detailed analysis (finding 4KB highest region in 70GB memory) with the light overhead. DAMON as a system optimization tool ----------------------------------- We also found below potential performance problems on the systems and made DAMON-based solutions. The system doesn't want to make the workload suffer from the page reclamation and thus it utilizes enough DRAM but no swap device. However, we found the system is actively reclaiming file-backed pages, because the system has intensive file IO. The file IO turned out to be not performance critical for the workload, but the customer wanted to ensure performance critical file-backed pages like code section to not mistakenly be evicted. Using direct IO should or `mlock()` would be a straightforward solution, but modifying the user space code is not easy for the customer. Alternatively, we could use DAMON-based operation scheme[1]. By using it, we can ask DAMON to track access frequency of each region and make 'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size and access frequency for a time interval. We also found the system is having high number of TLB misses. We tried 'always' THP enabled policy and it greatly reduced TLB misses, but the page reclamation also been more frequent due to the THP internal fragmentation caused memory bloat. We could try another DAMON-based operation scheme that applies 'MADV_HUGEPAGE' to memory regions having >=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to regions having <2MB size and low access frequency. We do not own the systems so we only reported the analysis results and possible optimization solutions to the customers. The customers satisfied about the analysis results and promised to try the optimization guides. [1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/ [2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/ Comparison with Idle Page Tracking ================================== Idle Page Tracking allows users to set and read idleness of pages using a bitmap file which represents each page with each bit of the file. One recommended usage of it is working set size detection. Users can do that by 1. find PFN of each page for workloads in interest, 2. set all the pages as idle by doing writes to the bitmap file, 3. wait until the workload accesses its working set, and 4. read the idleness of the pages again and count pages became not idle. NOTE: While Idle Page Tracking is for user space users, DAMON is primarily designed for kernel subsystems though it can easily exposed to the user space. Hence, this section only assumes such user space use of DAMON. For what use cases Idle Page Tracking would be better? ------------------------------------------------------ 1. Flexible usecases other than hotness monitoring. Because Idle Page Tracking allows users to control the primitive (Page idleness) by themselves, Idle Page Tracking users can do anything they want. Meanwhile, DAMON is primarily designed to monitor the hotness of each memory region. For this, DAMON asks users to provide sampling interval and aggregation interval. For the reason, there could be some use case that using Idle Page Tracking is simpler. 2. Physical memory monitoring. Idle Page Tracking receives PFN range as input, so natively supports physical memory monitoring. DAMON is designed to be extensible for multiple address spaces and use cases by implementing and using primitives for the given use case. Therefore, by theory, DAMON has no limitation in the type of target address space as long as primitives for the given address space exists. However, the default primitives introduced by this patchset supports only virtual address spaces. Therefore, for physical memory monitoring, you should implement your own primitives and use it, or simply use Idle Page Tracking. Nonetheless, RFC patchsets[1] for the physical memory address space primitives is already available. It also supports user memory same to Idle Page Tracking. [1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/ For what use cases DAMON is better? ----------------------------------- 1. Hotness Monitoring. Idle Page Tracking let users know only if a page frame is accessed or not. For hotness check, the user should write more code and use more memory. DAMON do that by itself. 2. Low Monitoring Overhead DAMON receives user's monitoring request with one step and then provide the results. So, roughly speaking, DAMON require only O(1) user/kernel context switches. In case of Idle Page Tracking, however, because the interface receives contiguous page frames, the number of user/kernel context switches increases as the monitoring target becomes complex and huge. As a result, the context switch overhead could be not negligible. Moreover, DAMON is born to handle with the monitoring overhead. Because the core mechanism is pure logical, Idle Page Tracking users might be able to implement the mechanism on their own, but it would be time consuming and the user/kernel context switching will still more frequent than that of DAMON. Also, the kernel subsystems cannot use the logic in this case. 3. Page granularity working set size detection. Until v22 of this patchset, this was categorized as the thing Idle Page Tracking could do better, because DAMON basically maintains additional metadata for each of the monitoring target regions. So, in the page granularity working set size detection use case, DAMON would incur (number of monitoring target pages * size of metadata) memory overhead. Size of the single metadata item is about 54 bytes, so assuming 4KB pages, about 1.3% of monitoring target pages will be additionally used. All essential metadata for Idle Page Tracking are embedded in 'struct page' and page table entries. Therefore, in this use case, only one counter variable for working set size accounting is required if Idle Page Tracking is used. There are more details to consider, but roughly speaking, this is true in most cases. However, the situation changed from v23. Now DAMON supports arbitrary types of monitoring targets, which don't use the metadata. Using that, DAMON can do the working set size detection with no additional space overhead but less user-kernel context switch. A first draft for the implementation of monitoring primitives for this usage is available in a DAMON development tree[1]. An RFC patchset for it based on this patchset will also be available soon. Since v24, the arbitrary type support is dropped from this patchset because this patchset doesn't introduce real use of the type. You can still get it from the DAMON development tree[2], though. [1] https://github.com/sjp38/linux/tree/damon/pgidle_hack [2] https://github.com/sjp38/linux/tree/damon/master 4. More future usecases While Idle Page Tracking has tight coupling with base primitives (PG_Idle and page table Accessed bits), DAMON is designed to be extensible for many use cases and address spaces. If you need some special address type or want to use special h/w access check primitives, you can write your own primitives for that and configure DAMON to use those. Therefore, if your use case could be changed a lot in future, using DAMON could be better. Can I use both Idle Page Tracking and DAMON? -------------------------------------------- Yes, though using them concurrently for overlapping memory regions could result in interference to each other. Nevertheless, such use case would be rare or makes no sense at all. Even in the case, the noise would bot be really significant. So, you can choose whatever you want depending on the characteristics of your use cases. More Information ================ We prepared a showcase web site[1] that you can get more information. There are - the official documentations[2], - the heatmap format dynamic access pattern of various realistic workloads for heap area[3], mmap()-ed area[4], and stack[5] area, - the dynamic working set size distribution[6] and chronological working set size changes[7], and - the latest performance test results[8]. [1] https://damonitor.github.io/_index [2] https://damonitor.github.io/doc/html/latest-damon [3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html [4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html [5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html [6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html [7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html [8] https://damonitor.github.io/test/result/perf/latest/html/index.html Baseline and Complete Git Trees =============================== The patches are based on the latest -mm tree, specifically v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can also clone the complete git tree: $ git clone git://github.com/sjp38/linux -b damon/patches/v34 The web is also available: https://github.com/sjp38/linux/releases/tag/damon/patches/v34 Development Trees ----------------- There are a couple of trees for entire DAMON patchset series and features for future release. - For latest release: https://github.com/sjp38/linux/tree/damon/master - For next release: https://github.com/sjp38/linux/tree/damon/next Long-term Support Trees ----------------------- For people who want to test DAMON but using LTS kernels, there are another couple of trees based on two latest LTS kernels respectively and containing the 'damon/master' backports. - For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y - For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y Amazon Linux Kernel Trees ------------------------- DAMON is also merged in two public Amazon Linux kernel trees that based on v5.4.y[1] and v5.10.y[2]. [1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon [2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon Git Tree for Diff of Patches ============================ For easy review of diff between different versions of each patch, I prepared a git tree containing all versions of the DAMON patchset series: https://github.com/sjp38/damon-patches You can clone it and use 'diff' for easy review of changes between different versions of the patchset. For example: $ git clone https://github.com/sjp38/damon-patches && cd damon-patches $ diff -u damon/v33 damon/v34 Sequence Of Patches =================== First three patches implement the core logics of DAMON. The 1st patch introduces basic sampling based hotness monitoring for arbitrary types of targets. Following two patches implement the core mechanisms for control of overhead and accuracy, namely regions based sampling (patch 2) and adaptive regions adjustment (patch 3). Now the essential parts of DAMON is complete, but it cannot work unless someone provides monitoring primitives for a specific use case. The following two patches make it just work for virtual address spaces monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the 5th patch implements the virtual memory address space specific monitoring primitives using page table Accessed bits and the 'PG_idle' page flag. Now DAMON just works for virtual address space monitoring via the kernel space api. To let the user space users can use DAMON, following four patches add interfaces for them. The 6th patch adds a tracepoint for monitoring results. The 7th patch implements a DAMON application kernel module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON interface to the user space via the debugfs interface. The 8th patch further exports pid of monitoring thread (kdamond) to user space for easier cpu usage accounting, and the 9th patch makes the debugfs interface to support multiple contexts. Three patches for maintainability follows. The 10th patch adds documentations for both the user space and the kernel space. The 11th patch provides unit tests (based on the kunit) while the 12th patch adds user space tests (based on the kselftest). Finally, the last patch (13th) updates the MAINTAINERS file. This patch (of 13): DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON make it - accurate (the monitoring output is useful enough for DRAM level performance-centric memory management; It might be inappropriate for CPU cache levels, though), - light-weight (the monitoring overhead is normally low enough to be applied online), and - scalable (the upper-bound of the overhead is in constant range regardless of the size of target workloads). Using this framework, hence, we can easily write efficient kernel space data access monitoring applications. For example, the kernel's memory management mechanisms can make advanced decisions using this. Experimental data access aware optimization works that incurring high access monitoring overhead could again be implemented on top of this. Due to its simple and flexible interface, providing user space interface would be also easy. Then, user space users who have some special workloads can write personalized applications for better understanding and optimizations of their workloads and systems. === Nevertheless, this commit is defining and implementing only basic access check part without the overhead-accuracy handling core logic. The basic access check is as below. The output of DAMON says what memory regions are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks access to each page per ``sampling interval`` and aggregates the results. In other words, counts the number of the accesses to each region. After each ``aggregation interval`` passes, DAMON calls callback functions that previously registered by users so that users can read the aggregated results and then clears the results. This can be described in below simple pseudo-code:: init() while monitoring_on: for page in monitoring_target: if accessed(page): nr_accesses[page] += 1 if time() % aggregation_interval == 0: for callback in user_registered_callbacks: callback(monitoring_target, nr_accesses) for page in monitoring_target: nr_accesses[page] = 0 if time() % update_interval == 0: update() sleep(sampling interval) The target regions constructed at the beginning of the monitoring and updated after each ``regions_update_interval``, because the target regions could be dynamically changed (e.g., mmap() or memory hotplug). The monitoring overhead of this mechanism will arbitrarily increase as the size of the target workload grows. The basic monitoring primitives for actual access check and dynamic target regions construction aren't in the core part of DAMON. Instead, it allows users to implement their own primitives that are optimized for their use case and configure DAMON to use those. In other words, users cannot use current version of DAMON without some additional works. Following commits will implement the core mechanisms for the overhead-accuracy control and default primitives implementations. Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Leonard Foerster <foersleo@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Marco Elver <elver@google.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Joe Perches <joe@perches.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Markus Boehme <markubo@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `2224d84854`) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: I578b47551306e4c416e7b97c57335f810783615c	2022-04-22 15:19:29 -07:00
Kalesh Singh	88e4dbaf59	ANDROID: Make MGLRU aware of speculative faults Bug: 228525049 Bug: 227651406 Change-Id: Ib3d45f74c5c23cbb1af7aaf95c90826baf406c7a Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-04-22 16:03:44 +00:00
Marc Zyngier	e7c680add6	ANDROID: KVM: arm64: Prevent HVC calls outside of the core kernel text Modules can easily wreak havoc in the hypervisor by calling into it randomly, making it very hard to understand what is going on. Given that limiting hypercalls to the core kernel is actually pretty easy (a simple comparaison with _text and _etext), let's implement that. This is made extra-complicated due to KASLR and the disjointed VA spaces (you can't just refer to _text, as this results in a relative reference...). Bug: 210011561 Signed-off-by: Marc Zyngier <maz@kernel.org> Change-Id: I2f21871d7fe0fb22fd3660dbc1317ec8968d5b61 Signed-off-by: Sebastian Ene <sebastianene@google.com>	2022-04-22 10:37:55 +00:00
Paul Lawrence	32169780e8	ANDROID: fuse-bpf: Fix misuse of args.out_args Test: fuse_test Bug: 202785178 Signed-off-by: Paul Lawrence <paullawrence@google.com> Change-Id: I332d196329bba257a577d3ddc140136aa03bfdf1	2022-04-21 21:35:11 +00:00
Petri Gynther	df2083258d	ANDROID: Update the ABI representation Leaf changes summary: 1 artifact changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 1 Added function: [A] 'function int __trace_bputs(unsigned long int, const char*)' Bug: 229909445 Signed-off-by: Petri Gynther <pgynther@google.com> Change-Id: I0049c23b3948a47fc7984083a80339aced101693	2022-04-21 10:10:39 -07:00
Petri Gynther	d7b1683f78	ANDROID: add __trace_bputs() to aarch64 ABI Add __trace_bputs() to ABI, so that vendor modules can use trace_printk() for development debugging. android12-5.10 already has this, so replicating for android13-5.10. Bug: 229909445 Signed-off-by: Petri Gynther <pgynther@google.com> Change-Id: Ida8266c92bd03aade3e93ff75393e07e6100e5ce	2022-04-21 10:09:00 -07:00
Yifan Hong	f6c964af25	ANDROID: Suppress build.sh deprecation warnings. build.sh will continued to be supported in android13-* branches. To avoid confusion, suppress the deprecation warnings when executing build.sh on android13-*. This change also avoids the time delay for inferring the equivalent Bazel command. It is still encouraged to migrate build.sh to Bazel. Test: manually execute build.sh, no deprecation warnings Bug: 222074706 Change-Id: I8b62a442cb154f43375a9dae6593340c79ba556c Signed-off-by: Yifan Hong <elsk@google.com>	2022-04-21 15:37:47 +00:00
David Brazdil	5d6831add7	ANDROID: KVM: arm64: s2mpu: Allow r/o access to control regs To ease debugging, allow the host to read the state of S2MPU's control registers. These values do not need to be kept secret from the host. Bug: 190463801 Signed-off-by: David Brazdil <dbrazdil@google.com> Change-Id: Ib9e5be443f38a0ae8fb0d4f5820017d728adf64b	2022-04-21 16:33:28 +01:00
David Brazdil	d5c0f0f937	ANDROID: KVM: arm64: s2mpu: Allow reading MPTC entries The state of the S2MPU does not need to be kept secret from the host as it merely reflects the permissions that the host has and knows about. To make debugging DMA issues easier, allow the host to query entries from the MPTC cache. This involves writing the set and way IDs of the query to the READ_MPTC register and then reading the MPTC entry information from READ_MPTC_TAG_PPN/TAG_OTHERS/DATA. Modify the S2MPU DABT handler to allow this register access pattern. Bug: 190463801 Bug: 229793579 Signed-off-by: David Brazdil <dbrazdil@google.com> Change-Id: I6bbcafa6b21c541774932c3b197d2888fd50202c	2022-04-21 16:33:28 +01:00
David Brazdil	e56d9603a6	ANDROID: KVM: arm64: s2mpu: Allow L1ENTRY_* r/o access Allow read-only access to L1ENTRY_ATTR and L1ENTRY_L2TABLE S2MPU registers. This allows the host to dump the register state for debugging purposes. It is safe because the state of the S2MPU is known to the host anyway. Bug: 190463801 Signed-off-by: David Brazdil <dbrazdil@google.com> Change-Id: I4fbcc3f7fac3f51ed47ba85ee4eb408fbf154e2d	2022-04-21 16:33:28 +01:00
David Brazdil	96767ad7be	ANDROID: KVM: arm64: s2mpu: Refactor DABT handler In preparation for adding more entries to the list of S2MPU registers accessible to the host, refactor the code to use a switch instead of a series of ifs. No functional change intended. Bug: 190463801 Signed-off-by: David Brazdil <dbrazdil@google.com> Change-Id: I70afa8f755d6d96916cdc1f813e6506e97e761c0	2022-04-21 16:33:28 +01:00
David Brazdil	c43dfe89fe	ANDROID: KVM: arm64: s2mpu: Extract L1ENTRY_* consts Extract the L1ENTRY_ATTR_{PRON,GRAN}_MASK constants out of macros that create the corresponding constants. This will allow EL1 users to use the masks to get the fields out of register values. Also extract L1ENTRY_L2TABLE_ADDR_SHIFT for adjusting the L2 table address. Bug: 190463801 Signed-off-by: David Brazdil <dbrazdil@google.com> Change-Id: I45578857694ca39266fe45b3c00dbea33738167f	2022-04-21 16:33:28 +01:00
Theodore Ts'o	7a9a532432	BACKPORT: ext4: don't BUG if someone dirty pages without asking ext4 first [ Upstream commit `cc5095747e` ] [un]pin_user_pages_remote is dirtying pages without properly warning the file system in advance. A related race was noted by Jan Kara in 2018[1]; however, more recently instead of it being a very hard-to-hit race, it could be reliably triggered by process_vm_writev(2) which was discovered by Syzbot[2]. This is technically a bug in mm/gup.c, but arguably ext4 is fragile in that if some other kernel subsystem dirty pages without properly notifying the file system using page_mkwrite(), ext4 will BUG, while other file systems will not BUG (although data will still be lost). So instead of crashing with a BUG, issue a warning (since there may be potential data loss) and just mark the page as clean to avoid unprivileged denial of service attacks until the problem can be properly fixed. More discussion and background can be found in the thread starting at [2]. [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Change-Id: Ifa528c386540d70eafb7ec55b238add0c6ba7387	2022-04-21 07:54:02 +00:00
Zhang Qilong	c383610d0f	UPSTREAM: binder: change error code from postive to negative in binder_transaction Depending on the context, the error return value here (extra_buffers_size < added_size) should be negative. Acked-by: Martijn Coenen <maco@android.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201026110314.135481-1-zhangqilong3@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `88f6c77927`) Signed-off-by: Carlos Llamas <cmllamas@google.com> Change-Id: If90760a8469f33bb121f0f71098b9415f6d4d783	2022-04-21 00:37:07 +00:00
Daniel Rosenberg	d4d78c7278	ANDROID: fuse-bpf: Fix non-fusebpf build Added #ifdefs around fuse-bpf init/cleanup code Bug: 202785178 Test: builds with and without CONFIG_FUSE_BPF Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ie15bb04e439b496e4842303437b3f55c3da14f2c	2022-04-20 14:05:59 -07:00
Daniel Rosenberg	9a5023967b	ANDROID: fuse-bpf: Use fuse_bpf_args in uapi fuse_args is not suitable for use in the uapi - it is not stable, and contains internal pointers. Replace with stable equivalent. The end_offset values are currently unused and unset, but will be used in a follow up patch by the verifier. Test: fuse_test, atest ScopedStorageDeviceTest pass Bug: 202785178 Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ic1c12f9706aeae233cc30a0d68ed2533030e485b	2022-04-20 20:57:26 +00:00
Johannes Berg	92c8c21ad0	BACKPORT: nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size commit `6624bb34b4` upstream. We need this to be at least two bytes, so we can access alpha2[0] and alpha2[1]. It may be three in case some userspace used NUL-termination since it was NLA_STRING (and we also push it out with NUL-termination). Cc: stable@vger.kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Change-Id: I6f2338c409076f0960bb7305cee4959a22aa9280	2022-04-20 11:21:04 +01:00
Midas Chien	65533e0212	ANDROID: Update the ABI representation Leaf changes summary: 1 artifact changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable 1 Added variable: [A] 'int console_set_on_cmdline' Bug: 202781851 Signed-off-by: Midas Chien <midaschieh@google.com> Change-Id: I302b16e6eeec60b070721c54980d989fb1d31c26	2022-04-20 03:53:28 +00:00
Andrey Konovalov	a1013fd19b	FROMLIST: kasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t Fix sparse warning: mm/kasan/shadow.c:496:15: warning: restricted kasan_vmalloc_flags_t degrades to integer Link: https://lkml.kernel.org/r/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/all/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com/T/#u Bug: 217222520 Change-Id: I04133e8e9610b81fd0c856ece4f566110094bcb1 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-04-20 00:58:47 +00:00
Vincenzo Frascino	c098614509	FROMLIST: kasan: fix hw tags enablement when KUNIT tests are disabled Kasan enables hw tags via kasan_enable_tagging() which based on the mode passed via kernel command line selects the correct hw backend. kasan_enable_tagging() is meant to be invoked indirectly via the cpu features framework of the architectures that support these backends. Currently the invocation of this function is guarded by CONFIG_KASAN_KUNIT_TEST which allows the enablement of the correct backend only when KUNIT tests are enabled in the kernel. This inconsistency was introduced in commit: `ed6d74446c` ("kasan: test: support async (again) and asymm modes for HW_TAGS") ... and prevents to enable MTE on arm64 when KUNIT tests for kasan hw_tags are disabled. Fix the issue making sure that the CONFIG_KASAN_KUNIT_TEST guard does not prevent the correct invocation of kasan_enable_tagging(). Link: https://lkml.kernel.org/r/20220408124323.10028-1-vincenzo.frascino@arm.com Fixes: `ed6d74446c` ("kasan: test: support async (again) and asymm modes for HW_TAGS") Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/all/20220408124323.10028-1-vincenzo.frascino@arm.com/T/#u Bug: 217222520 Change-Id: Ib4f05d74e091db57d2a8d5000d67137105d59a4c Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-04-20 00:58:37 +00:00
Fabio Aiuto	f60a0b3285	UPSTREAM: usb: dwc3: leave default DMA for PCI devices in case of a PCI dwc3 controller, leave the default DMA mask. Calling of a 64 bit DMA mask breaks the driver on cherrytrail based tablets like Cyberbook T116. Fixes: `45d39448b4` ("usb: dwc3: support 64 bit DMA in platform driver") Cc: stable <stable@vger.kernel.org> Reported-by: Hans De Goede <hdegoede@redhat.com> Tested-by: Fabio Aiuto <fabioaiuto83@gmail.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Fabio Aiuto <fabioaiuto83@gmail.com> Link: https://lore.kernel.org/r/20211113142959.27191-1-fabioaiuto83@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `47ce45906c`) Bug: 228575378 Signed-off-by: Albert Wang <albertccwang@google.com> Change-Id: Id4e73116d4f8f603b8282e7bd7717cf28b5a3bf2	2022-04-20 00:40:38 +00:00
Sven Peter	3b508e8fe4	UPSTREAM: usb: dwc3: support 64 bit DMA in platform driver Currently, the dwc3 platform driver does not explicitly ask for a DMA mask. This makes it fall back to the default 32-bit mask which breaks the driver on systems that only have RAM starting above the first 4G like the Apple M1 SoC. Fix this by calling dma_set_mask_and_coherent with a 64bit mask. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20210607061751.89752-1-sven@svenpeter.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `45d39448b4`) Bug: 228575378 Signed-off-by: Albert Wang <albertccwang@google.com> Change-Id: Icd994fde819fdd6fb0896d1bec1777fb73f454ee	2022-04-20 00:40:28 +00:00
Rick Yiu	03f40d5252	ANDROID: Update the ABI representation Leaf changes summary: 4 artifacts changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 2 Added functions Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 2 Added variables 2 Added functions: [A] 'function int __traceiter_android_rvh_set_task_cpu(void, task_struct, unsigned int)' [A] 'function int __traceiter_android_rvh_update_rt_rq_load_avg(void, u64, rq, task_struct*, int)' 2 Added variables: [A] 'tracepoint __tracepoint_android_rvh_set_task_cpu' [A] 'tracepoint __tracepoint_android_rvh_update_rt_rq_load_avg' Bug: 201261299 Signed-off-by: Rick Yiu <rickyiu@google.com> Change-Id: Ie1265a9d638e7826b6185bfde0ab8f900b51c6b0	2022-04-19 23:56:45 +00:00
Kalesh Singh	6db38c5bbc	FROMGIT: EXP rcu: Move expedited grace period (GP) work to RT kthread_worker Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period latency because its workqueues run at SCHED_OTHER, and thus can be delayed by normal processes. This commit avoids these delays by moving the expedited GP work items to a real-time-priority kthread_worker. This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by default on PREEMPT_RT=y kernels which disable expedited grace periods after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and capturing trace data in critical user-level code. The table below shows the resulting order-of-magnitude improvements in synchronize_rcu_expedited() latency: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Count \| 725 \| 688 \| \| ------------------------------------------------------------------------ \| Min Duration (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Q1 (ns) \| 39,428 \| 38,971 \| -1.16% \| ------------------------------------------------------------------------ \| Q2 - Median (ns) \| 98,225 \| 69,743 \| -29.00% \| ------------------------------------------------------------------------ \| Q3 (ns) \| 342,122 \| 126,638 \| -62.98% \| ------------------------------------------------------------------------ \| Max Duration (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Avg Duration (ns) \| 2,746,353 \| 151,242 \| -94.49% \| ------------------------------------------------------------------------ \| Standard Deviation (ns) \| 19,327,765 \| 294,408 \| \| ------------------------------------------------------------------------ The below table show the range of maximums/minimums for synchronize_rcu_expedited() latency from all experiments: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Total No. of Experiments \| 25 \| 23 \| \| ------------------------------------------------------------------------ \| Largest Maximum (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Smallest Maximum (ns) \| 38,819 \| 86,954 \| 124.00% \| ------------------------------------------------------------------------ \| Range of Maximums (ns) \| 372,728,148 \| 2,242,717 \| \| ------------------------------------------------------------------------ \| Largest Minimum (ns) \| 88,623 \| 27,588 \| -68.87% \| ------------------------------------------------------------------------ \| Smallest Minimum (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Range of Minimums (ns) \| 88,297 \| 27,141 \| \| ------------------------------------------------------------------------ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: Tim Murray <timmurray@google.com> Reported-by: Wei Wang <wvw@google.com> Tested-by: Kyle Lin <kylelin@google.com> Tested-by: Chunwei Lu <chunweilu@google.com> Tested-by: Lulu Wang <luluw@google.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220409003527.1587028-1-kaleshsingh@google.com/ (cherry picked from commit 3902dd17a29bd3ed1ead364a331a1761edb7162b git: //git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git fastexp.2022.04.11a) Bug: 224791892 Change-Id: I4cc5d28f9ae99e44e92afc43ff4db4b71c415d6c	2022-04-19 23:20:24 +00:00
Sajid Dalvi	68c87a277c	ANDROID: Update the ABI representation Leaf changes summary: 2 artifacts changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 1 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable 1 Added function: [A] 'function int __traceiter_android_rvh_pci_d3_sleep(void, pci_dev, unsigned int*)' 1 Added variable: [A] 'tracepoint __tracepoint_android_rvh_pci_d3_sleep' Bug: 229125931 Signed-off-by: Sajid Dalvi <sdalvi@google.com> Change-Id: I7db067bd3d468b11826e5e59ee9c96706fbad760	2022-04-19 12:45:34 -05:00
Jens Axboe	699e6e3211	UPSTREAM: block: fix async_depth sysfs interface for mq-deadline A previous commit added this feature, but it inadvertently used the wrong variable to show/store the setting from/to, victimized by copy/paste. Fix it up so that the async_depth sysfs interface reads and writes from the right setting. Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215485 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Change-Id: I28273a92ac7cbebc830df8b80ad461948402bcd2 (cherry picked from commit `46cdc45acb`) Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-19 15:34:11 +00:00
Sajid Dalvi	53ff5efb2c	ANDROID: PCI/PM: Use usleep_range for d3hot_delay This patch implements a vendor hook that changes d3hot_delay to use usleep_range() instead of msleep() to reduce the resume time from 20ms to 10ms. The call sequence is as follows: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_power_up() pci_raw_set_power_state() --> msleep(10) The default d3hot_delay is 10ms. Using msleep for delays less than 20ms could result in delays up to 20ms. Reference: Documentation/timers/timers-howto.rst Using usleep_range() results in the delay being closer to 10ms and this reduces the resume time. Bug: 194231641 Change-Id: If3e4dcfb99edad302371273933fa6784854cf892 Signed-off-by: Sajid Dalvi <sdalvi@google.com>	2022-04-19 07:17:39 +00:00
Minchan Kim	609fa1be7a	ANDROID: mm: page_pinner: fix elapsed time Put the elapsed time instead of zero all the time. Bug: 218731671 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ibb319e7dfce2d47481e2462bfb8423fbd2ddad66	2022-04-18 23:50:40 +00:00
Minchan Kim	d5d9a23576	ANDROID: mm: retry GUP with orignal gup_flags on failure If GUP fails due to modified flags by vendor hook, try one more time with original flag to keep the API semantic. Bug: 229391920 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: If2c20fe3e1752c9cc0d7d350ad84b38d8325f9ae	2022-04-18 23:50:32 +00:00
Todd Kjos	6acb261444	ANDROID: GKI: 4/15/2022 KMI freeze Set KMI_GENERATION=4 for 4/15 KMI freeze Leaf changes summary: 2734 artifacts changed Changed leaf types summary: 8 leaf types changed Removed/Changed/Added functions summary: 0 Removed, 2677 Changed, 0 Added function Removed/Changed/Added variables summary: 0 Removed, 49 Changed, 0 Added variable 2677 functions with some sub-type change: [C] 'function void* PDE_DATA(const inode)' at generic.c:799:1 has some sub-type changes: CRC (modversions) changed from 0x830fd868 to 0xb70c2d59 [C] 'function void __ClearPageMovable(page)' at compaction.c:138:1 has some sub-type changes: CRC (modversions) changed from 0x274b4312 to 0x1e91976d [C] 'function void __SetPageMovable(page, address_space)' at compaction.c:130:1 has some sub-type changes: CRC (modversions) changed from 0xaf251a50 to 0xb1221a47 ... 2674 omitted; 2677 symbols have only CRC changes 49 Changed variables: [C] 'pglist_data contig_page_data' was changed at memblock.c:96:1: size of symbol changed from 5696 to 6976 CRC (modversions) changed from 0xa8156534 to 0x7007215 type of variable changed: type size changed from 45568 to 55808 (in bits) 1 data member insertion: 'lru_gen_mm_walk mm_walk', at offset 51520 (in bits) at mmzone.h:1039:1 there are data member changes: type 'struct lruvec' of 'pglist_data::__lruvec' changed: type size changed from 1088 to 9664 (in bits) 3 data member insertions: 'lru_gen_struct lrugen', at offset 1024 (in bits) at mmzone.h:497:1 'lru_gen_mm_state mm_state', at offset 8576 (in bits) at mmzone.h:499:1 'u64 android_vendor_data1', at offset 9600 (in bits) at mmzone.h:504:1 there are data member changes: 'pglist_data* pgdat' offset changed (by +8512 bits) 2977 impacted interfaces 'unsigned long int flags' offset changed (by +8576 bits) 3 ('zone_padding _pad2_' .. 'atomic_long_t vm_stat[38]') offsets changed (by +10240 bits) 2977 impacted interfaces [C] 'task_struct init_task' was changed at init_task.c:64:1: CRC (modversions) changed from 0x124472e1 to 0xf5fdc492 type of variable changed: type size hasn't changed 1 data member insertion: 'unsigned int in_lru_fault', at offset 11332 (in bits) at sched.h:840:1 there are data member changes: 4 ('unsigned int no_cgroup_migration' .. 'unsigned int in_memstall') offsets changed (by +1 bits) 2977 impacted interfaces [C] 'bus_type amba_bustype' was changed at bus.c:215:1: CRC (modversions) changed from 0x55933f58 to 0xefd95b38 [C] 'const clk_ops clk_fixed_factor_ops' was changed at clk-fixed-factor.c:60:1: CRC (modversions) changed from 0x38f07e1d to 0xb94d81d6 [C] 'const clk_ops clk_fixed_rate_ops' was changed at clk-fixed-rate.c:46:1: CRC (modversions) changed from 0x47fbebbe to 0x5299a868 ... 44 omitted; 47 symbols have only CRC changes 'struct lruvec at mmzone.h:280:1' changed: details were reported earlier 'struct mem_cgroup at memcontrol.h:211:1' changed: type size hasn't changed 1 data member insertion: 'lru_gen_mm_list mm_list', at offset 23168 (in bits) at memcontrol.h:337:1 there are data member changes: 2 ('u64 android_oem_data1' .. 'mem_cgroup_per_node* nodeinfo[]') offsets changed (by +192 bits) 2977 impacted interfaces 'struct mem_cgroup_per_node at memcontrol.h:107:1' changed: type size changed from 5184 to 13760 (in bits) there are data member changes: type 'struct lruvec' of 'mem_cgroup_per_node::lruvec' changed, as reported earlier 10 ('lruvec_stat* lruvec_stat_local' .. 'mem_cgroup* memcg') offsets changed (by +8576 bits) 2977 impacted interfaces 'struct mm_struct at mm_types.h:419:1' changed: type size changed from 7680 to 7936 (in bits) there are data member changes: anonymous data member at offset 0 (in bits) changed from: struct {vm_area_struct* mmap; rb_root mm_rb; u64 vmacache_seqnum; rwlock_t mm_rb_lock; unsigned long int (file, unsigned long int, unsigned long int, unsigned long int, unsigned long int) get_unmapped_area; unsigned long int mmap_base; unsigned long int mmap_legacy_base; unsigned long int task_size; unsigned long int highest_vm_end; pgd_t* pgd; atomic_t membarrier_state; atomic_t mm_users; atomic_t mm_count; atomic_t has_pinned; atomic_long_t pgtables_bytes; int map_count; spinlock_t page_table_lock; rw_semaphore mmap_lock; list_head mmlist; unsigned long int hiwater_rss; unsigned long int hiwater_vm; unsigned long int total_vm; unsigned long int locked_vm; atomic64_t pinned_vm; unsigned long int data_vm; unsigned long int exec_vm; unsigned long int stack_vm; unsigned long int def_flags; seqcount_t write_protect_seq; spinlock_t arg_lock; unsigned long int start_code; unsigned long int end_code; unsigned long int start_data; unsigned long int end_data; unsigned long int start_brk; unsigned long int brk; unsigned long int start_stack; unsigned long int arg_start; unsigned long int arg_end; unsigned long int env_start; unsigned long int env_end; unsigned long int saved_auxv[46]; mm_rss_stat rss_stat; linux_binfmt* binfmt; mm_context_t context; unsigned long int flags; core_state* core_state; spinlock_t ioctx_lock; kioctx_table* ioctx_table; task_struct* owner; user_namespace* user_ns; file* exe_file; mmu_notifier_subscriptions* notifier_subscriptions; percpu_rw_semaphore* mmu_notifier_lock; atomic_t tlb_flush_pending; uprobes_state uprobes_state; work_struct async_put_work; u32 pasid; u64 android_kabi_reserved1;} to: struct {vm_area_struct* mmap; rb_root mm_rb; u64 vmacache_seqnum; rwlock_t mm_rb_lock; unsigned long int (file, unsigned long int, unsigned long int, unsigned long int, unsigned long int) get_unmapped_area; unsigned long int mmap_base; unsigned long int mmap_legacy_base; unsigned long int task_size; unsigned long int highest_vm_end; pgd_t* pgd; atomic_t membarrier_state; atomic_t mm_users; atomic_t mm_count; atomic_t has_pinned; atomic_long_t pgtables_bytes; int map_count; spinlock_t page_table_lock; rw_semaphore mmap_lock; list_head mmlist; unsigned long int hiwater_rss; unsigned long int hiwater_vm; unsigned long int total_vm; unsigned long int locked_vm; atomic64_t pinned_vm; unsigned long int data_vm; unsigned long int exec_vm; unsigned long int stack_vm; unsigned long int def_flags; seqcount_t write_protect_seq; spinlock_t arg_lock; unsigned long int start_code; unsigned long int end_code; unsigned long int start_data; unsigned long int end_data; unsigned long int start_brk; unsigned long int brk; unsigned long int start_stack; unsigned long int arg_start; unsigned long int arg_end; unsigned long int env_start; unsigned long int env_end; unsigned long int saved_auxv[46]; mm_rss_stat rss_stat; linux_binfmt* binfmt; mm_context_t context; unsigned long int flags; core_state* core_state; spinlock_t ioctx_lock; kioctx_table* ioctx_table; task_struct* owner; user_namespace* user_ns; file* exe_file; mmu_notifier_subscriptions* notifier_subscriptions; percpu_rw_semaphore* mmu_notifier_lock; atomic_t tlb_flush_pending; uprobes_state uprobes_state; work_struct async_put_work; u32 pasid; struct {list_head list; mem_cgroup* memcg; nodemask_t nodes;} lru_gen; u64 android_kabi_reserved1;} and size changed from 7680 to 7936 (in bits) (by +256 bits) 'unsigned long int cpu_bitmap[]' offset changed (by +256 bits) 2977 impacted interfaces 'struct pglist_data at mmzone.h:729:1' changed: details were reported earlier 'struct reclaim_state at swap.h:131:1' changed: type size changed from 64 to 128 (in bits) 1 data member insertion: 'lru_gen_mm_walk* mm_walk', at offset 64 (in bits) at swap.h:135:1 2977 impacted interfaces 'struct scsi_device at scsi_device.h:102:1' changed: type size hasn't changed 1 data member insertion: 'unsigned int silence_suspend', at offset 2448 (in bits) at scsi_device.h:209:1 there are data member changes: 'bool offline_already' offset changed (by +8 bits) 45 impacted interfaces 'struct task_struct at sched.h:660:1' changed: details were reported earlier Bug: 229630433 Signed-off-by: Todd Kjos <tkjos@google.com> Change-Id: Iacd70a1553401ead91351db0b5b8ec6dfee6e6ec	2022-04-18 11:58:28 -07:00
Bing Han	a034320a68	ANDROID: add vendor fields to swap_slots_cache to support multiple swap devices struct swap_slots_cache :: ANDROID_VENDOR_DATA(1) 1) Multiple swap devices can be supported; 2) There are different kinds of data; 3) During data reclamation, different types of data are exchanged to different swap devices; 4) Each swap device has corresponding arrays of slots and slots_ret; 5) Each swap device has corresponding indexes of nr, cur and n_ret; 6) This field is a pointer, it points to a struct which contains all the other arrays and indexes; Bug: 225795494 Change-Id: Icf116135926be98449a2d96fc458e58e5ad3b7e9 Signed-off-by: Bing Han <bing.han@transsion.com>	2022-04-18 10:20:37 -07:00
Bing Han	1b14ae01b0	ANDROID: add vendor fields to lruvec to record refault stats struct lruvec :: ANDROID_VENDOR_DATA(1) It is pointer to a struct to record the following message: 1）the account of workingset_restore pages of cached anonymous and file pages This is used to adjust the strategy and amount of reclaiming data. Bug: 225795494 Change-Id: I34e57ee23b6c97ac91effa5b72513d238335a996 Signed-off-by: Bing Han <bing.han@transsion.com>	2022-04-18 10:20:23 -07:00
Bing Han	af4eb0e377	ANDROID: add vendor fields to swap_info_struct to record swap stats struct swap_info_struct :: ANDROID_VENDOR_DATA(1) It is pointer to a struct to record the following message: 1) total swapin pages; 2) total swapout pages; 3) total number of cold pages swapin; 4) total number of swapout pages, specified by userspace; 5) total number of swapout pages, specified by kernel; 6) the maxmium number of swapout pages; 7) the maxmium number of swapout pages allowed by kernel; 8) the maxmium number of swapout pages allowed by framework; Bug: 225795494 Change-Id: I779145a83d87e339db86ec81c7f962be99946afb Signed-off-by: Bing Han <bing.han@transsion.com>	2022-04-18 10:20:06 -07:00
Bart Van Assche	fae5207ecc	ANDROID: scsi: ufs: Add suspend/resume SCSI command processing support This functionality is needed by UFS drivers to e.g. suspend SCSI command processing while reprogramming encryption keys if the hardware does not support concurrent I/O and key reprogramming. Bug: 227177294 Change-Id: I10f11e67da81fae7063674838760903d2c178baf Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-18 10:18:44 -07:00
Bart Van Assche	64293a57f1	ANDROID: scsi: ufs: Pass the clock scaling timeout as an argument Prepare for adding an additional ufshcd_clock_scaling_prepare() call with a different timeout. Bug: 227177294 Change-Id: I67a569b074c292a3c37f20a1b1e36f95b682c5e8 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-18 10:18:30 -07:00
Bart Van Assche	69014b2b36	ANDROID: scsi: ufs: Move a clock scaling check Move a check related to clock scaling into ufshcd_devfreq_scale(). This patch prepares for adding a second ufshcd_clock_scaling_prepare() caller. Bug: 227177294 Change-Id: I928d4cbe64823960a6112ba7f98c18da6244a77c Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-18 10:18:15 -07:00
Bart Van Assche	aca52cabdb	ANDROID: scsi: ufs: Reduce the clock scaling latency Wait at most 20 ms before rechecking the doorbells instead of waiting for a potentially long time between doorbell checks. Bug: 227177294 Change-Id: I8a4dd0e93ca02435264961851a095a9c83c68240 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-18 10:18:03 -07:00
Peter Wang	00ed95fe93	FROMGIT: scsi: ufs: core: scsi_get_lba() error fix When ufs initializes without scmd->device->sector_size set, scsi_get_lba() will get a wrong shift number and trigger an ubsan error. The shift exponent 4294967286 is too large for the 64-bit type 'sector_t' (aka 'unsigned long long'). Call scsi_get_lba() only when opcode is READ_10/WRITE_10/UNMAP. Link: https://lore.kernel.org/r/20220307111752.10465-1-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit `2bd3b6b759` git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next) Bug: 204438323 Change-Id: I3457fcc88d7c4164c55010e440d9f274c169553e Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-04-18 10:17:48 -07:00

1 2 3 4 5 ...

988207 Commits