diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index 2a2b5bd8299a..cd727cfc1b04 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -31,7 +31,6 @@ the Linux memory management. idle_page_tracking ksm memory-hotplug - multigen_lru nommu-mmap numa_memory_policy numaperf diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst deleted file mode 100644 index 3d9a6ef84229..000000000000 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ /dev/null @@ -1,152 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============= -Multi-Gen LRU -============= -The multi-gen LRU is an alternative LRU implementation that optimizes -page reclaim and improves performance under memory pressure. Page -reclaim decides the kernel's caching policy and ability to overcommit -memory. It directly impacts the kswapd CPU usage and RAM efficiency. - -Quick start -=========== -Build the kernel with the following configurations. - -* ``CONFIG_LRU_GEN=y`` -* ``CONFIG_LRU_GEN_ENABLED=y`` - -All set! - -Runtime options -=============== -``/sys/kernel/mm/lru_gen/`` contains stable ABIs described in the -following subsections. - -Kill switch ------------ -``enable`` accepts different values to enable or disable the following -components. Its default value depends on ``CONFIG_LRU_GEN_ENABLED``. -All the components should be enabled unless some of them have -unforeseen side effects. Writing to ``enable`` has no effect when a -component is not supported by the hardware, and valid values will be -accepted even when the main switch is off. - -====== =============================================================== -Values Components -====== =============================================================== -0x0001 The main switch for the multi-gen LRU. -0x0002 Clearing the accessed bit in leaf page table entries in large - batches, when MMU sets it (e.g., on x86). This behavior can - theoretically worsen lock contention (mmap_lock). If it is - disabled, the multi-gen LRU will suffer a minor performance - degradation. -0x0004 Clearing the accessed bit in non-leaf page table entries as - well, when MMU sets it (e.g., on x86). This behavior was not - verified on x86 varieties other than Intel and AMD. If it is - disabled, the multi-gen LRU will suffer a negligible - performance degradation. -[yYnN] Apply to all the components above. -====== =============================================================== - -E.g., -:: - - echo y >/sys/kernel/mm/lru_gen/enabled - cat /sys/kernel/mm/lru_gen/enabled - 0x0007 - echo 5 >/sys/kernel/mm/lru_gen/enabled - cat /sys/kernel/mm/lru_gen/enabled - 0x0005 - -Thrashing prevention --------------------- -Personal computers are more sensitive to thrashing because it can -cause janks (lags when rendering UI) and negatively impact user -experience. The multi-gen LRU offers thrashing prevention to the -majority of laptop and desktop users who do not have ``oomd``. - -Users can write ``N`` to ``min_ttl_ms`` to prevent the working set of -``N`` milliseconds from getting evicted. The OOM killer is triggered -if this working set cannot be kept in memory. In other words, this -option works as an adjustable pressure relief valve, and when open, it -terminates applications that are hopefully not being used. - -Based on the average human detectable lag (~100ms), ``N=1000`` usually -eliminates intolerable janks due to thrashing. Larger values like -``N=3000`` make janks less noticeable at the risk of premature OOM -kills. - -The default value ``0`` means disabled. - -Experimental features -===================== -``/sys/kernel/debug/lru_gen`` accepts commands described in the -following subsections. Multiple command lines are supported, so does -concatenation with delimiters ``,`` and ``;``. - -``/sys/kernel/debug/lru_gen_full`` provides additional stats for -debugging. ``CONFIG_LRU_GEN_STATS=y`` keeps historical stats from -evicted generations in this file. - -Working set estimation ----------------------- -Working set estimation measures how much memory an application -requires in a given time interval, and it is usually done with little -impact on the performance of the application. E.g., data centers want -to optimize job scheduling (bin packing) to improve memory -utilizations. When a new job comes in, the job scheduler needs to find -out whether each server it manages can allocate a certain amount of -memory for this new job before it can pick a candidate. To do so, this -job scheduler needs to estimate the working sets of the existing jobs. - -When it is read, ``lru_gen`` returns a histogram of numbers of pages -accessed over different time intervals for each memcg and node. -``MAX_NR_GENS`` decides the number of bins for each histogram. -:: - - memcg memcg_id memcg_path - node node_id - min_gen_nr age_in_ms nr_anon_pages nr_file_pages - ... - max_gen_nr age_in_ms nr_anon_pages nr_file_pages - -Each generation contains an estimated number of pages that have been -accessed within ``age_in_ms`` non-cumulatively. E.g., ``min_gen_nr`` -contains the coldest pages and ``max_gen_nr`` contains the hottest -pages, since ``age_in_ms`` of the former is the largest and that of -the latter is the smallest. - -Users can write ``+ memcg_id node_id max_gen_nr -[can_swap[full_scan]]`` to ``lru_gen`` to create a new generation -``max_gen_nr+1``. ``can_swap`` defaults to the swap setting and, if it -is set to ``1``, it forces the scan of anon pages when swap is off. -``full_scan`` defaults to ``1`` and, if it is set to ``0``, it reduces -the overhead as well as the coverage when scanning page tables. - -A typical use case is that a job scheduler writes to ``lru_gen`` at a -certain time interval to create new generations, and it ranks the -servers it manages based on the sizes of their cold memory defined by -this time interval. - -Proactive reclaim ------------------ -Proactive reclaim induces memory reclaim when there is no memory -pressure and usually targets cold memory only. E.g., when a new job -comes in, the job scheduler wants to proactively reclaim memory on the -server it has selected to improve the chance of successfully landing -this new job. - -Users can write ``- memcg_id node_id min_gen_nr [swappiness -[nr_to_reclaim]]`` to ``lru_gen`` to evict generations less than or -equal to ``min_gen_nr``. Note that ``min_gen_nr`` should be less than -``max_gen_nr-1`` as ``max_gen_nr`` and ``max_gen_nr-1`` are not fully -aged and therefore cannot be evicted. ``swappiness`` overrides the -default value in ``/proc/sys/vm/swappiness``. ``nr_to_reclaim`` limits -the number of pages to evict. - -A typical use case is that a job scheduler writes to ``lru_gen`` -before it tries to land a new job on a server, and if it fails to -materialize the cold memory without impacting the existing jobs on -this server, it retries on the next server according to the ranking -result obtained from the working set estimation step described -earlier. diff --git a/mm/Kconfig b/mm/Kconfig index 68456c420da5..a9494cbd5db0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -922,8 +922,7 @@ config LRU_GEN # the following options can use up the spare bits in page flags depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) help - A high performance LRU implementation to overcommit memory. See - Documentation/admin-guide/mm/multigen_lru.rst for details. + A high performance LRU implementation to overcommit memory. config LRU_GEN_ENABLED bool "Enable by default"