Previously, the block size always equaled to PAGE_SIZE, therefore
`lclusterbits` couldn't be less than 12.
Since sub-page compressed blocks are now considered, `lobits` for
a lcluster in each pack cannot always be `lclusterbits` as before.
Otherwise, there is no enough room for the special value
`Z_EROFS_VLE_DI_D0_CBLKCNT`.
To support smaller block sizes, `lobits` for each compacted lcluster is
now calculated as:
lobits = max(lclusterbits, ilog2(Z_EROFS_VLE_DI_D0_CBLKCNT) + 1)
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-4-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Iacd89e2b33ddf39ea40b90e88a2bf99bb5a83b31
(cherry picked from commit 8d2517aaeea3ab8651bb517bca8f3c8664d318ea
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
[dhavale: resolved conflicts in zmap.c due to older naming of constants
and updated commit message also to use the older names]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Currently, compressed sizes are recorded in pages using `pclusterpages`,
However, for tailpacking pclusters, `tailpacking_size` is used instead.
This approach doesn't work when dealing with sub-page blocks. To address
this, let's switch them to the unified `pclustersize` in bytes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-3-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Ia8c50a7b4adcf6cd161b1d6f8bfc5a7fd3371079
(cherry picked from commit 54ed3fdd66055d073cb1cd2c6c65bbc0683c40cf
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Add a basic I/O submission path first to support sub-page blocks:
- Temporary short-lived pages will be used entirely;
- In-place I/O pages can be used partially, but compressed pages need
to be able to be mapped in contiguous virtual memory.
As a start, currently cache decompression is explicitly disabled for
sub-page blocks, which will be supported in the future.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-2-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Ib2cb6120805ab479a450580fc8774af131271791
(cherry picked from commit 192351616a9dde686492bcb9d1e4895a1411a527
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Currently EROFS can map another compressed buffer for inplace
decompression, that was used to handle the cases that some pages of
compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed
data is arranged at the end of the decompressed buffer and it
explicitly uses memmove() to handle overlapping:
__________________________________________________________
|_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two
individual virtual buffers so the relative order is uncertain.
Previously, it was hardly observed since LZ4 only uses memmove() for
short overlapped literals and x86/arm64 memmove implementations seem to
completely cover it up and they don't have this issue. Juhyung reported
that EROFS data corruption can be found on a new Intel x86 processor.
After some analysis, it seems that recent x86 processors with the new
FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace
decompression for now. Later, as an useful improvement, we could try
to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park <qkrwngud825@gmail.com>
Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2Q@mail.gmail.com
Fixes: 0ffd71bcc3 ("staging: erofs: introduce LZ4 decompression inplace")
Fixes: 598162d050 ("erofs: support decompress big pcluster for lz4 backend")
Cc: stable <stable@vger.kernel.org> # 5.4+
Tested-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Ifd2981320f9f79b27bc7484d8906501a2fa05359
(cherry picked from commit 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
{collector,collection} were once reserved in order to indicate different
runtime logical extent instance of multi-reference pclusters.
However, de-duplicated decompression has been landed in a more flexable
way, thus `struct z_erofs_collection` was formally removed in commit
87ca34a706 ("erofs: get rid of `struct z_erofs_collection'").
Let's handle the remaining leftovers, for example:
`z_erofs_collector_begin` => `z_erofs_pcluster_begin`
`z_erofs_collector_end` => `z_erofs_pcluster_end`
as well as some comments. No logic changes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817082813.81180-2-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: I61b812b5ae3dd564e52012d082415b1fc198383d
(cherry picked from commit dcba1b232e)
[dhavale: fixed minor conflict zdata.c in z_erofs_do_read_page()]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
If non-bootstrap bvecs cannot be kept in place (very rarely), an extra
short-lived page is allocated.
Let's just allocate it immediately rather than do unnecessary -EAGAIN
return first and retry as a cleanup. Also it's unnecessary to use
__GFP_NOFAIL here since we could gracefully fail out this case instead.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20230526201459.128169-2-hsiangkao@linux.alibaba.com
Bug: 318378021
(cherry picked from commit 05b63d2beb)
Change-Id: I2ac45a943060406bcbb741c5f7aa1094f783f906
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
`end` parameter is no needed since it's pointless for !backmost, we can
handle it with backmost internally. And we only expand the trailing
edge, so the newstart can be replaced with ->headoffset.
Also, remove linux/prefetch.h inclusion since that is not used anymore
after commit 386292919c ("erofs: introduce readmore decompression
strategy").
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230525072605.17857-1-zbestahu@gmail.com
[ Gao Xiang: update commit description. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Bug: 318378021
(cherry picked from commit 796e9149a2)
Change-Id: I9412c4111800077c876a43c4256ce9760a7d902e
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Enable large folios for iomap mode. Then the readahead routine will
pass down large folios containing multiple pages.
Let's enable this for non-compressed format for now, until the
compression part supports large folios later.
When large folios supported, the iomap routine will allocate iomap_page
for each large folio and thus we need iomap_release_folio() and
iomap_invalidate_folio() to free iomap_page when these folios get
reclaimed or invalidated.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221130060455.44532-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Bug: 318378021
Change-Id: Iedbb9a2daf132399b7a1b5ea6905977ba123ba3c
(cherry picked from commit ce529cc25b)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
The KOBJ_CHANGE uevent is sent before gadget unbind is actually
executed, resulting in inaccurate uevent emitted at incorrect timing
(the uevent would have USB_UDC_DRIVER variable set while it would
soon be removed).
Move the KOBJ_CHANGE uevent to the end of the unbind function so that
uevent is sent only after the change has been made.
Fixes: 2ccea03a8f ("usb: gadget: introduce UDC Class")
Cc: stable@vger.kernel.org
Signed-off-by: Roy Luo <royluo@google.com>
Link: https://lore.kernel.org/r/20231128221756.2591158-1-royluo@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 312543856
Change-Id: Ida7fa7e1cfae3d1b3f3348512a67fe91065f25af
(cherry picked from commit 73ea73affe8622bdf292de898da869d441da6a9d)
Signed-off-by: Roy Luo <royluo@google.com>
Add hooks at rt_mutex_steal function so that oems can decide
whether tasks with the same priority steal the rt_mutex or
not. We did experiments and found that rt_mutex throughput
can benefit a lot when threads with the same priority can
steal the rt_mutex lock.
Bug: 317670024
Change-Id: Id60a7a41c6c77a67808982d3667946cabe4acc8f
Signed-off-by: xeiliujie <xieliujie@oppo.com>
We found an issue under Android OTA scenario that many BIOs have to do
FEC where the data under dm-verity is 100% complete and no corruption.
Android OTA has many dm-block layers, from upper to lower:
dm-verity
dm-snapshot
dm-origin & dm-cow
dm-linear
ufs
DM tables have to change 2 times during Android OTA merging process.
When doing table change, the dm-snapshot will be suspended for a while.
During this interval, many readahead IOs are submitted to dm_verity
from filesystem. Then the kverity works are busy doing FEC process
which cost too much time to finish dm-verity IO. This causes needless
delay which feels like system is hung.
After adding debugging it was found that each readahead IO needed
around 10s to finish when this situation occurred. This is due to IO
amplification:
dm-snapshot suspend
erofs_readahead // 300+ io is submitted
dm_submit_bio (dm_verity)
dm_submit_bio (dm_snapshot)
bio return EIO
bio got nothing, it's empty
verity_end_io
verity_verify_io
forloop range(0, io->n_blocks) // each io->nblocks ~= 20
verity_fec_decode
fec_decode_rsb
fec_read_bufs
forloop range(0, v->fec->rsn) // v->fec->rsn = 253
new_read
submit_bio (dm_snapshot)
end loop
end loop
dm-snapshot resume
Readahead BIOs get nothing while dm-snapshot is suspended, so all of
them will cause verity's FEC.
Each readahead BIO needs to verify ~20 (io->nblocks) blocks.
Each block needs to do FEC, and every block needs to do 253
(v->fec->rsn) reads.
So during the suspend interval(~200ms), 300 readahead BIOs trigger
~1518000 (300*20*253) IOs to dm-snapshot.
As readahead IO is not required by userspace, and to fix this issue,
it is best to pass readahead errors to upper layer to handle it.
Cc: stable@vger.kernel.org
Fixes: a739ff3f54 ("dm verity: add support for forward error correction")
Bug: 316972624
Link: https://lore.kernel.org/dm-devel/b84fb49-bf63-3442-8c99-d565e134f2@redhat.com
Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Akilesh Kailash <akailash@google.com>
(cherry picked from commit 0193e3966ceeeef69e235975918b287ab093082b)
Change-Id: I73560e5660cebdc1997e1f9926cbb8888789eb46
commit 317eb9685095678f2c9f5a8189de698c5354316a upstream.
Otherwise set elements can be deactivated twice which will cause a crash.
Bug: 316310313
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 189c2a8293)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I27fb6ee806642e23ca02700763a387341dd463e6
Bug: 292925770
Test: fuse_test run. The following steps on Android also now pass:
Create /data/123 and /data/media/0/Android/data/45 directories
Mount /data/123 directory to /data/media/0/Android/data/45 directory
Create 1.txt under the /data/123 directory
File 1.txt should appear in /storage/emulated/0/Android/data/45
Change-Id: I1fe27d743ca2981e624a9aa87d9ab6deb313aadc
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Sync to android13-5.10. This vendor hook is declared already.
Bug: 245675204
Change-Id: Ib081b52542380d22317f225a50b553cda5f2634c
Signed-off-by: Rick Yiu <rickyiu@google.com>
(cherry picked from commit f9688670ca)
DAMON sysfs interface's before_damos_apply callback
(damon_sysfs_before_damos_apply()), which creates the DAMOS tried regions
for each DAMOS action applied region, is not handling the allocation
failure for the sysfs directory data. As a result, NULL pointer
derefeence is possible. Fix it by handling the case.
Link: https://lkml.kernel.org/r/20231106233408.51159-4-sj@kernel.org
Fixes: f1d13cacab ("mm/damon/sysfs: implement DAMOS tried regions update command")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ae636ae2bbfd9279f5681dbf320d1da817e52b68)
Bug: 300502883
Change-Id: I98568f4b0cee9fea82f4fe6d3e7a505370c3c304
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
DAMOS tried regions sysfs directory allocation function
(damon_sysfs_scheme_regions_alloc()) is not handling the memory allocation
failure. In the case, the code will dereference NULL pointer. Handle the
failure to avoid such invalid access.
Link: https://lkml.kernel.org/r/20231106233408.51159-3-sj@kernel.org
Fixes: 9277d0367b ("mm/damon/sysfs-schemes: implement scheme region directory")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 84055688b6bc075c92a88e2d6c3ad26ab93919f9)
Bug: 300502883
Change-Id: I86ecb2f3cf1604199b5567576b1fa583914f7f36
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Patch series "mm/damon/sysfs: fix unhandled return values".
Some of DAMON sysfs interface code is not handling return values from some
functions. As a result, confusing user input handling or NULL-dereference
is possible. Check those properly.
This patch (of 3):
damon_sysfs_update_target() returns error code for failures, but its
caller, damon_sysfs_set_targets() is ignoring that. The update function
seems making no critical change in case of such failures, but the behavior
will look like DAMON sysfs is silently ignoring or only partially
accepting the user input. Fix it.
Link: https://lkml.kernel.org/r/20231106233408.51159-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231106233408.51159-2-sj@kernel.org
Fixes: 19467a950b49 ("mm/damon/sysfs: remove requested targets when online-commit inputs")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b4936b544b08ed44949055b92bd25f77759ebafc)
Bug: 300502883
Change-Id: I9bfea66f76ad094ed73defee5ff3fdb3794e8162
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
When user input is committed online, DAMON sysfs interface is ignoring the
user input for the monitoring target regions. Such request is valid and
useful for fixed monitoring target regions-based monitoring ops like
'paddr' or 'fvaddr'.
Update the region boundaries as user specified, too. Note that the
monitoring results of the regions that overlap between the latest
monitoring target regions and the new target regions are preserved.
Treat empty monitoring target regions user request as a request to just
make no change to the monitoring target regions. Otherwise, users should
set the monitoring target regions same to current one for every online
input commit, and it could be challenging for dynamic monitoring target
regions update DAMON ops like 'vaddr'. If the user really need to remove
all monitoring target regions, they can simply remove the target and then
create the target again with empty target regions.
Link: https://lkml.kernel.org/r/20231031170131.46972-1-sj@kernel.org
Fixes: da87878010 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9732336006764e2ee61225387e3c70eae9139035)
Bug: 300502883
Change-Id: I6857482470951382c9be36f2099da76e9b71d502
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
damon_sysfs_set_targets(), which updates the targets of the context for
online commitment, do not remove targets that removed from the
corresponding sysfs files. As a result, more than intended targets of the
context can exist and hence consume memory and monitoring CPU resource
more than expected.
Fix it by removing all targets of the context and fill up again using the
user input. This could cause unnecessary memory dealloc and realloc
operations, but this is not a hot code path. Also, note that damon_target
is stateless, and hence no data is lost.
[sj@kernel.org: fix unnecessary monitoring results removal]
Link: https://lkml.kernel.org/r/20231028213353.45397-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231022210735.46409-2-sj@kernel.org
Fixes: da87878010 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: <stable@vger.kernel.org> [5.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 19467a950b49432a84bf6dbadbbb17bdf89418b7)
Bug: 300502883
Change-Id: Icf094f138e6810182d23d2c412fbabe3ecd960fe
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
DAMON_SYSFS assumes all schemes will be applied for at least one DAMON
monitoring results snapshot within one aggregation interval, or makes no
sense to wait for it while DAMON is deactivated by the watermarks. That
for deactivated status still makes sense, but the aggregation interval
based assumption is invalid now because each scheme can has its own apply
interval. For schemes having larger than the aggregation or watermarks
check interval, DAMOS tried regions update request can be finished without
the update. Avoid the case by explicitly checking the status of the
schemes tried regions update and watermarks based DAMON deactivation.
Link: https://lkml.kernel.org/r/20231012192256.33556-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 76126332c7606ba25a4ae5db37145fd526985b45)
Bug: 300502883
Change-Id: I8283709a023123d7a89fd37a1d4a834888c15c7e
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Patch series "mm/damon/sysfs-schemes: Do DAMOS tried regions update for
only one apply interval".
DAMOS tried regions update feature of DAMON sysfs interface is doing the
update for one aggregation interval after the request is made. Since the
per-scheme apply interval is supported, that behavior makes no much sense.
That is, the tried regions directory will have regions from multiple
DAMON monitoring results snapshots, or no region for apply intervals that
much shorter than, or longer than the aggregation interval, respectively.
Update the behavior to update the regions for each scheme for only its
apply interval, and update the document.
Since DAMOS apply interval is the aggregation by default, this change
makes no visible behavioral difference to old users who don't explicitly
set the apply intervals.
Patches Sequence
----------------
The first two patches makes schemes of apply intervals that much shorter
or longer than the aggregation interval to keep the maximum and minimum
times for continuing the update. After the two patches, the update aligns
with the each scheme's apply interval.
Finally, the third patch updates the document to reflect the behavior.
This patch (of 3):
DAMON_SYSFS exposes every DAMON-found region that eligible for applying
the scheme action for one aggregation interval. However, each DAMON-based
operation scheme has its own apply interval. Hence, for a scheme that
having its apply interval much smaller than the aggregation interval,
DAMON_SYSFS will expose the scheme regions that applied to more than one
DAMON monitoring results snapshots. Since the purpose of DAMON tried
regions is exposing single snapshot, this makes no much sense. Track
progress of each scheme's tried regions update and avoid the case.
Link: https://lkml.kernel.org/r/20231012192256.33556-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231012192256.33556-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4d4e41b682990b1dc5bba2bc313800340bf5c2d4)
Bug: 300502883
Change-Id: I78602a6810a9b4d8d131c3ace69f255ac1349d13
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
DAMON_SYSFS can receive DAMOS tried regions update request while kdamond
is already out of the main loop and before_terminate callback
(damon_sysfs_before_terminate() in this case) is not yet called. And
damon_sysfs_handle_cmd() can further be finished before the callback is
invoked. Then, damon_sysfs_before_terminate() unlocks damon_sysfs_lock,
which is not locked by anyone. This happens because the callback function
assumes damon_sysfs_cmd_request_callback() should be called before it.
Check if the assumption was true before doing the unlock, to avoid this
problem.
Link: https://lkml.kernel.org/r/20231007200432.3110-1-sj@kernel.org
Fixes: f1d13cacab ("mm/damon/sysfs: implement DAMOS tried regions update command")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.2.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 76b7069bcc)
Bug: 300502883
Change-Id: I7cd5e00c0d0226dc8d7856d103f88a26307cafce
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Using tried_regions/total_bytes file, users can efficiently retrieve the
total size of memory regions having specific access pattern. However,
DAMON sysfs interface in kernel still populates all the infomration on the
tried_regions subdirectories. That means the kernel part overhead for the
construction of tried regions directories still exists. To remove the
overhead, implement yet another command input for 'state' DAMON sysfs
file. Writing the input to the file makes DAMON sysfs interface to update
only the total_bytes file.
Link: https://lkml.kernel.org/r/20230802213222.109841-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6ad243b83b)
Bug: 300502883
Change-Id: Id0bdf13858d6a92de0eeef22f59a65ee884e9d20
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Patch series "mm/damon/sysfs-schemes: implement DAMOS tried total bytes
file".
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead for
directory construction and user overhead for reading the content could be
high if the number of monitoring region is not small. This patchset
implements DAMON sysfs files for efficient support of the use case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
[1] https://lore.kernel.org/damon/20230728201817.70602-1-sj@kernel.org/
This patch (of 5):
The tried_regions directory can be used for retrieving the monitoring
results snapshot for regions of specific access pattern, by setting the
scheme's action as 'stat' and the access pattern as required. While the
interface provides every detail of the monitoring results, some use cases
including working set size monitoring requires only the total size of the
regions. For such cases, users should read all the information and
calculate the total size of the regions. However, it could incur high
overhead if the number of regions is high. Add a file for retrieving only
the information, namely 'total_bytes' file. It allows users to get the
total size by reading only the file.
Link: https://lkml.kernel.org/r/20230802213222.109841-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20230802213222.109841-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b69f92a741)
Bug: 300502883
Change-Id: I49c225d15ba09a9b896341da14cc9f2b45578da7
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Sometimes there is no scheme in damon's context, for example just use damo
record to monitor workload's data access pattern.
If current damon context doesn't have any scheme in the list, kdamond has
no need to iterate over list of all targets and regions but do nothing.
So, skip apply schemes when ctx->schemes is empty.
Link: https://lkml.kernel.org/r/20230116062347.1148553-1-huaisheng.ye@intel.com
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 64517d6e12)
Bug: 300502883
Change-Id: Ic76ca90c85dbb24205b17dd914f91a8dd4cf7345
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
When there are huge number of DAMON regions that specific scheme actions
are tried to be applied, directories and files under 'tried_regions'
scheme directory could waste some memory. Add another special input
keyword ('clear_schemes_tried_regions') for 'state' file of each kdamond
sysfs directory that can be used for cleanup of the 'tried_regions'
sub-directories.
[sj@kernel.org: skip regions clearing if the scheme directory was removed]
Link: https://lkml.kernel.org/r/20221114182954.4745-3-sj@kernel.org
Link: https://lkml.kernel.org/r/20221101220328.95765-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 772c15e5ad)
Bug: 300502883
Change-Id: I969e05ce1fa4599bae50454633b61b5320eaa67d
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Implement the code for filling the data of 'tried_regions' DAMON sysfs
directory. With this commit, DAMON sysfs interface users can write a
special keyword, 'update_schemes_tried_regions' to the corresponding
'state' file of the kdamond. Then, DAMON sysfs interface will collect the
tried regions information using the 'before_damos_apply()' callback for
one aggregation interval and populate scheme region directories with the
values.
[sj@kernel.org: skip tried regions update if the scheme directory was removed]
Link: https://lkml.kernel.org/r/20221114182954.4745-2-sj@kernel.org
Link: https://lkml.kernel.org/r/20221101220328.95765-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f1d13cacab)
Bug: 300502883
Change-Id: I6749b8dc75023a9a3f3dc64902196b07fa523267
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Implement region directories under 'tried_regions' directory of each
scheme DAMON sysfs directory. This directory will provide the address
range, the monitored access frequency ('nr_accesses'), and the age of each
DAMON region that corresponding DAMON-based operation scheme has tried to
be applied. Note that this commit doesn't implement the code for filling
the data but only the sysfs directory.
Link: https://lkml.kernel.org/r/20221101220328.95765-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9277d0367b)
Bug: 300502883
Change-Id: I69c9010a8fce2fa61a1d27f2964ac7bc7b85dd44
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
For efficient and simple query-like DAMON monitoring results readings and
deep level investigations of DAMOS, DAMON kernel API
(include/linux/damon.h) users can use 'before_damos_apply' DAMON callback.
However, DAMON sysfs interface users don't have such option.
Add a directory, namely 'tried_regions', under each scheme directory to
use it as the interface for the purpose. Note that this commit is
implementing only the directory but the data filling.
After the data filling change is made, users will be able to signal DAMON
to fill the directory with the regions that corresponding scheme has tried
to be applied. By setting the access pattern of the scheme, users could
do the efficient query-like monitoring.
Link: https://lkml.kernel.org/r/20221101220328.95765-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 5181b75f43)
Bug: 300502883
Change-Id: Idc7a1fca201b90f8fea62899f1e6b500bb8e14e1
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>
Patch series "efficiently expose damos action tried regions information".
DAMON users can retrieve the monitoring results via 'after_aggregation'
callbacks if the user is using the kernel API, or 'damon_aggregated'
tracepoint if the user is in the user space. Those are useful if full
monitoring results are necessary. However, if the user has interest in
only a snapshot of the results for some regions having specific access
pattern, the interfaces could be inefficient. For example, some users
only want to know which memory regions are not accessed for more than a
specific time at the moment.
Also, some DAMOS users would want to know exactly to what memory regions
the schemes' actions tried to be applied, for a debugging or a tuning. As
DAMOS has its internal mechanism for quota and regions prioritization, the
users would need to simulate DAMOS' mechanism against the monitoring
results. That's unnecessarily complex.
This patchset implements DAMON kernel API callbacks and sysfs directory
for efficient exposure of the information for the use cases. The new
callback will be called for each region when a DAMOS action is gonna tried
to be applied to it. The sysfs directory will be called 'tried_regions'
and placed under each scheme sysfs directory. Users can write a special
keyworkd, 'update_schemes_regions', to the 'state' file of a kdamond sysfs
directory. Then, DAMON sysfs interface will fill the directory with the
information of regions that corresponding scheme action was tried to be
applied for next one aggregation interval.
Patches Sequence
----------------
The first one (patch 1) implements the callback for the kernel space
users. Following two patches (patches 2 and 3) implements sysfs
directories for the information and its sub directories. Two patches
(patches 4 and 5) for implementing the special keywords for filling the
data to and cleaning up the directories follow. Patch 6 adds a selftest
for the new sysfs directory. Finally, two patches (patches 7 and 8)
document the new feature in the administrator guide and the ABI document.
This patch (of 8):
Getting DAMON monitoring results of only specific access pattern (e.g.,
getting address ranges of memory that not accessed at all for two minutes)
can be useful for efficient monitoring of the system. The information can
also be helpful for deep level investigation of DAMON-based operation
schemes.
For that, users need to record (in case of the user space users) or
iterate (in case of the kernel space users) full monitoring results and
filter it out for the specific access pattern. In case of the DAMOS
investigation, users will even need to simulate DAMOS' quota and
prioritization mechanisms. It's inefficient and complex.
Add a new DAMON callback that will be called before each scheme is applied
to each region. DAMON kernel API users will be able to do the query-like
monitoring results collection, or DAMOS investigation in an efficient and
simple way using it.
Commits for providing the capability to the user space users will follow.
Link: https://lkml.kernel.org/r/20221101220328.95765-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20221101220328.95765-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 44467bbb7e)
Bug: 300502883
Change-Id: I21ff3c9cf6c30e113f78883e5063bcb898506b41
Signed-off-by: cui yangpei <cuiyangpei@xiaomi.com>