From ad7902a401f68e107b74c3543650798f454740b2 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Fri, 9 May 2025 10:09:12 +1200 Subject: [PATCH] BACKPORT: mm: userfaultfd: correct dirty flags set for both present and swap pte MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As David pointed out, what truly matters for mremap and userfaultfd move operations is the soft dirty bit. The current comment and implementation—which always sets the dirty bit for present PTEs and fails to set the soft dirty bit for swap PTEs—are incorrect. This could break features like Checkpoint-Restore in Userspace (CRIU). This patch updates the behavior to correctly set the soft dirty bit for both present and swap PTEs in accordance with mremap. Link: https://lkml.kernel.org/r/20250508220912.7275-1-21cnbao@gmail.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Barry Song Reported-by: David Hildenbrand Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ Acked-by: Peter Xu Reviewed-by: Suren Baghdasaryan Cc: Lokesh Gidra Cc: Andrea Arcangeli Cc: Signed-off-by: Andrew Morton (cherry picked from commit 75cb1cca2c880179a11c7dd9380b6f14e41a06a4) Merge Conflicts: 1. pte_mkwrite() doesn't take vma as second argument, so removed it. Change-Id: I5fc25f9028ad7972ea1b6d873f072fd15f9c7214 Signed-off-by: Lokesh Gidra --- mm/userfaultfd.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b45edacc7436..468747538b41 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -966,8 +966,13 @@ static int move_present_pte(struct mm_struct *mm, WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot); - /* Follow mremap() behavior and treat the entry dirty after the move */ - orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte)); + /* Set soft dirty bit so userspace can notice the pte was moved */ +#ifdef CONFIG_MEM_SOFT_DIRTY + orig_dst_pte = pte_mksoft_dirty(orig_dst_pte); +#endif + if (pte_dirty(orig_src_pte)) + orig_dst_pte = pte_mkdirty(orig_dst_pte); + orig_dst_pte = pte_mkwrite(orig_dst_pte); set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); out: @@ -1001,6 +1006,9 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, } orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); +#ifdef CONFIG_MEM_SOFT_DIRTY + orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte); +#endif set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); double_pt_unlock(dst_ptl, src_ptl);