From 34f1eb99850e7df61a91c58d2afd2a18a5cf91ad Mon Sep 17 00:00:00 2001 From: Wei Xu Date: Mon, 14 Oct 2024 22:12:11 +0000 Subject: [PATCH] UPSTREAM: mm/mglru: only clear kswapd_failures if reclaimable lru_gen_shrink_node() unconditionally clears kswapd_failures, which can prevent kswapd from sleeping and cause 100% kswapd cpu usage even when kswapd repeatedly fails to make progress in reclaim. Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some progress, similar to shrink_node(). I happened to run into this problem in one of my tests recently. It requires a combination of several conditions: The allocator needs to allocate a right amount of pages such that it can wake up kswapd without itself being OOM killed; there is no memory for kswapd to reclaim (My test disables swap and cleans page cache first); no other process frees enough memory at the same time. Bug: 254441685 Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Wei Xu Cc: Axel Rasmussen Cc: Brian Geffon Cc: Jan Alexander Steffens Cc: Suleiman Souhlal Cc: Yu Zhao Cc: Signed-off-by: Andrew Morton (cherry picked from commit b130ba4a6259f6b64d8af15e9e7ab1e912bcb7ad) Signed-off-by: Lee Jones Change-Id: Ia2b4a0d71096d1e6cd0ee6054df3544724d4b665 --- mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c14a16044515..08e98c9f0a90 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5654,8 +5654,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * blk_finish_plug(&plug); done: - /* kswapd should never fail */ - pgdat->kswapd_failures = 0; + if (sc->nr_reclaimed > reclaimed) + pgdat->kswapd_failures = 0; } /******************************************************************************