diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index 9f80cc8a1cd9..eff5fbd492d0 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -41,7 +41,6 @@ descriptions of data structures and algorithms. ksm memory-model mmu_notifier - multigen_lru numa overcommit-accounting page_migration diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst deleted file mode 100644 index 8f6498002576..000000000000 --- a/Documentation/vm/multigen_lru.rst +++ /dev/null @@ -1,160 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============= -Multi-Gen LRU -============= -The multi-gen LRU is an alternative LRU implementation that optimizes -page reclaim and improves performance under memory pressure. Page -reclaim decides the kernel's caching policy and ability to overcommit -memory. It directly impacts the kswapd CPU usage and RAM efficiency. - -Design overview -=============== -Objectives ----------- -The design objectives are: - -* Good representation of access recency -* Try to profit from spatial locality -* Fast paths to make obvious choices -* Simple self-correcting heuristics - -The representation of access recency is at the core of all LRU -implementations. In the multi-gen LRU, each generation represents a -group of pages with similar access recency. Generations establish a -common frame of reference and therefore help make better choices, -e.g., between different memcgs on a computer or different computers in -a data center (for job scheduling). - -Exploiting spatial locality improves efficiency when gathering the -accessed bit. A rmap walk targets a single page and does not try to -profit from discovering a young PTE. A page table walk can sweep all -the young PTEs in an address space, but the address space can be too -large to make a profit. The key is to optimize both methods and use -them in combination. - -Fast paths reduce code complexity and runtime overhead. Unmapped pages -do not require TLB flushes; clean pages do not require writeback. -These facts are only helpful when other conditions, e.g., access -recency, are similar. With generations as a common frame of reference, -additional factors stand out. But obvious choices might not be good -choices; thus self-correction is required. - -The benefits of simple self-correcting heuristics are self-evident. -Again, with generations as a common frame of reference, this becomes -attainable. Specifically, pages in the same generation can be -categorized based on additional factors, and a feedback loop can -statistically compare the refault percentages across those categories -and infer which of them are better choices. - -Assumptions ------------ -The protection of hot pages and the selection of cold pages are based -on page access channels and patterns. There are two access channels: - -* Accesses through page tables -* Accesses through file descriptors - -The protection of the former channel is by design stronger because: - -1. The uncertainty in determining the access patterns of the former - channel is higher due to the approximation of the accessed bit. -2. The cost of evicting the former channel is higher due to the TLB - flushes required and the likelihood of encountering the dirty bit. -3. The penalty of underprotecting the former channel is higher because - applications usually do not prepare themselves for major page - faults like they do for blocked I/O. E.g., GUI applications - commonly use dedicated I/O threads to avoid blocking the rendering - threads. - -There are also two access patterns: - -* Accesses exhibiting temporal locality -* Accesses not exhibiting temporal locality - -For the reasons listed above, the former channel is assumed to follow -the former pattern unless ``VM_SEQ_READ`` or ``VM_RAND_READ`` is -present, and the latter channel is assumed to follow the latter -pattern unless outlying refaults have been observed. - -Workflow overview -================= -Evictable pages are divided into multiple generations for each -``lruvec``. The youngest generation number is stored in -``lrugen->max_seq`` for both anon and file types as they are aged on -an equal footing. The oldest generation numbers are stored in -``lrugen->min_seq[]`` separately for anon and file types as clean file -pages can be evicted regardless of swap constraints. These three -variables are monotonically increasing. - -Generation numbers are truncated into ``order_base_2(MAX_NR_GENS+1)`` -bits in order to fit into the gen counter in ``page->flags``. Each -truncated generation number is an index to ``lrugen->lists[]``. The -sliding window technique is used to track at least ``MIN_NR_GENS`` and -at most ``MAX_NR_GENS`` generations. The gen counter stores a value -within ``[1, MAX_NR_GENS]`` while a page is on one of -``lrugen->lists[]``; otherwise it stores zero. - -Each generation is divided into multiple tiers. Tiers represent -different ranges of numbers of accesses through file descriptors. A -page accessed ``N`` times through file descriptors is in tier -``order_base_2(N)``. In contrast to moving across generations, which -requires the LRU lock, moving across tiers only requires operations on -``page->flags`` and therefore has a negligible cost. A feedback loop -modeled after the PID controller monitors refaults over all the tiers -from anon and file types and decides which tiers from which types to -evict or protect. - -There are two conceptually independent procedures: the aging and the -eviction. They form a closed-loop system, i.e., the page reclaim. - -Aging ------ -The aging produces young generations. Given an ``lruvec``, it -increments ``max_seq`` when ``max_seq-min_seq+1`` approaches -``MIN_NR_GENS``. The aging promotes hot pages to the youngest -generation when it finds them accessed through page tables; the -demotion of cold pages happens consequently when it increments -``max_seq``. The aging uses page table walks and rmap walks to find -young PTEs. For the former, it iterates ``lruvec_memcg()->mm_list`` -and calls ``walk_page_range()`` with each ``mm_struct`` on this list -to scan PTEs. On finding a young PTE, it clears the accessed bit and -updates the gen counter of the page mapped by this PTE to -``(max_seq%MAX_NR_GENS)+1``. After each iteration of this list, it -increments ``max_seq``. For the latter, when the eviction walks the -rmap and finds a young PTE, the aging scans the adjacent PTEs and -follows the same steps just described. - -Eviction --------- -The eviction consumes old generations. Given an ``lruvec``, it -increments ``min_seq`` when ``lrugen->lists[]`` indexed by -``min_seq%MAX_NR_GENS`` becomes empty. To select a type and a tier to -evict from, it first compares ``min_seq[]`` to select the older type. -If both types are equally old, it selects the one whose first tier has -a lower refault percentage. The first tier contains single-use -unmapped clean pages, which are the best bet. The eviction sorts a -page according to the gen counter if the aging has found this page -accessed through page tables and updated the gen counter. It also -moves a page to the next generation, i.e., ``min_seq+1``, if this page -was accessed multiple times through file descriptors and the feedback -loop has detected outlying refaults from the tier this page is in. To -do this, the feedback loop uses the first tier as the baseline, for -the reason stated earlier. - -Summary -------- -The multi-gen LRU can be disassembled into the following parts: - -* Generations -* Page table walks -* Rmap walks -* Bloom filters -* The PID controller - -The aging and the eviction is a producer-consumer model; specifically, -the latter drives the former by the sliding window over generations. -Within the aging, rmap walks drive page table walks by inserting hot -densely populated page tables to the Bloom filters. Within the -eviction, the PID controller uses refaults as the feedback to select -types to evict and tiers to protect.