From 11ce66c7a04b10ee42ccdd4e2af72a3773df09f7 Mon Sep 17 00:00:00 2001 From: "Mike Rapoport (Microsoft)" Date: Tue, 25 Nov 2025 15:29:08 +0200 Subject: [PATCH 01/27] MAINTAINERS: add Mike Rapoport as maintainer for userfaultfd Link: https://lkml.kernel.org/r/20251125132908.847055-1-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) Acked-by: David Hildenbrand (Red Hat) Acked-by: Vlastimil Babka Acked-by: Peter Xu Acked-by: Lorenzo Stoakes Cc: Liam Howlett Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index dc731d37c8fe..b12f1e623971 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16751,6 +16751,7 @@ F: tools/testing/selftests/mm/transhuge-stress.c MEMORY MANAGEMENT - USERFAULTFD M: Andrew Morton +M: Mike Rapoport R: Peter Xu L: linux-mm@kvack.org S: Maintained From 5393802c94e0ab1295c04c94c57bcb00222d4674 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Thu, 27 Nov 2025 10:39:24 -0800 Subject: [PATCH 02/27] genalloc.h: fix htmldocs warning WARNING: include/linux/genalloc.h:52 function parameter 'start_addr' not described in 'genpool_algo_t' Fixes: 52fbf1134d47 ("lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk") Reported-by: Stephen Rothwell Closes: https://lkml.kernel.org/r/20251127130624.563597e3@canb.auug.org.au Acked-by: Randy Dunlap Tested-by: Randy Dunlap Cc: Alexey Skidanov Signed-off-by: Andrew Morton --- include/linux/genalloc.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 0bd581003cd5..60de63e46b33 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -44,6 +44,7 @@ struct gen_pool; * @nr: The number of zeroed bits we're looking for * @data: optional additional data used by the callback * @pool: the pool being allocated from + * @start_addr: start address of memory chunk */ typedef unsigned long (*genpool_algo_t)(unsigned long *map, unsigned long size, From 87726567d83df9c006d506a201c3c78c3cda76ed Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Fri, 28 Nov 2025 14:33:18 +0100 Subject: [PATCH 03/27] mailmap: update entry for Bartosz Golaszewski My linaro address will stop working tonight. Update my mailmap entry. Link: https://lkml.kernel.org/r/20251128133318.44912-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski Cc: Hans Verkuil Signed-off-by: Andrew Morton --- .mailmap | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index 84309a39d329..e6431290e849 100644 --- a/.mailmap +++ b/.mailmap @@ -127,7 +127,8 @@ Barry Song Barry Song Bart Van Assche Bart Van Assche -Bartosz Golaszewski +Bartosz Golaszewski +Bartosz Golaszewski Ben Dooks Ben Dooks Ben Gardner From c6e8e595a0798ad67da0f7bebaf69c31ef70dfff Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 28 Nov 2025 16:18:32 +0000 Subject: [PATCH 04/27] idr: fix idr_alloc() returning an ID out of range MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If you use an IDR with a non-zero base, and specify a range that lies entirely below the base, 'max - base' becomes very large and idr_get_free() can return an ID that lies outside of the requested range. Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org Fixes: 6ce711f27500 ("idr: Make 1-based IDRs more efficient") Signed-off-by: Matthew Wilcox (Oracle) Reported-by: Jan Sokolowski Reported-by: Koen Koning Reported-by: Peter Senna Tschudin Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449 Reviewed-by: Christian König Cc: Signed-off-by: Andrew Morton --- lib/idr.c | 2 ++ tools/testing/radix-tree/idr-test.c | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/lib/idr.c b/lib/idr.c index e2adc457abb4..457430cff8c5 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -40,6 +40,8 @@ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid, if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR))) idr->idr_rt.xa_flags |= IDR_RT_MARKER; + if (max < base) + return -ENOSPC; id = (id < base) ? 0 : id - base; radix_tree_iter_init(&iter, id); diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c index 2f830ff8396c..945144e98507 100644 --- a/tools/testing/radix-tree/idr-test.c +++ b/tools/testing/radix-tree/idr-test.c @@ -57,6 +57,26 @@ void idr_alloc_test(void) idr_destroy(&idr); } +void idr_alloc2_test(void) +{ + int id; + struct idr idr = IDR_INIT_BASE(idr, 1); + + id = idr_alloc(&idr, idr_alloc2_test, 0, 1, GFP_KERNEL); + assert(id == -ENOSPC); + + id = idr_alloc(&idr, idr_alloc2_test, 1, 2, GFP_KERNEL); + assert(id == 1); + + id = idr_alloc(&idr, idr_alloc2_test, 0, 1, GFP_KERNEL); + assert(id == -ENOSPC); + + id = idr_alloc(&idr, idr_alloc2_test, 0, 2, GFP_KERNEL); + assert(id == -ENOSPC); + + idr_destroy(&idr); +} + void idr_replace_test(void) { DEFINE_IDR(idr); @@ -409,6 +429,7 @@ void idr_checks(void) idr_replace_test(); idr_alloc_test(); + idr_alloc2_test(); idr_null_test(); idr_nowait_test(); idr_get_next_test(0); From 007f5da43b3d0ecff972e2616062b8da1f862f5e Mon Sep 17 00:00:00 2001 From: Jiayuan Chen Date: Thu, 4 Dec 2025 18:59:55 +0000 Subject: [PATCH 05/27] mm/kasan: fix incorrect unpoisoning in vrealloc for KASAN Patch series "kasan: vmalloc: Fixes for the percpu allocator and vrealloc", v3. Patches fix two issues related to KASAN and vmalloc. The first one, a KASAN tag mismatch, possibly resulting in a kernel panic, can be observed on systems with a tag-based KASAN enabled and with multiple NUMA nodes. Initially it was only noticed on x86 [1] but later a similar issue was also reported on arm64 [2]. Specifically the problem is related to how vm_structs interact with pcpu_chunks - both when they are allocated, assigned and when pcpu_chunk addresses are derived. When vm_structs are allocated they are unpoisoned, each with a different random tag, if vmalloc support is enabled along the KASAN mode. Later when first pcpu chunk is allocated it gets its 'base_addr' field set to the first allocated vm_struct. With that it inherits that vm_struct's tag. When pcpu_chunk addresses are later derived (by pcpu_chunk_addr(), for example in pcpu_alloc_noprof()) the base_addr field is used and offsets are added to it. If the initial conditions are satisfied then some of the offsets will point into memory allocated with a different vm_struct. So while the lower bits will get accurately derived the tag bits in the top of the pointer won't match the shadow memory contents. The solution (proposed at v2 of the x86 KASAN series [3]) is to unpoison the vm_structs with the same tag when allocating them for the per cpu allocator (in pcpu_get_vm_areas()). The second one reported by syzkaller [4] is related to vrealloc and happens because of random tag generation when unpoisoning memory without allocating new pages. This breaks shadow memory tracking and needs to reuse the existing tag instead of generating a new one. At the same time an inconsistency in used flags is corrected. This patch (of 3): Syzkaller reported a memory out-of-bounds bug [4]. This patch fixes two issues: 1. In vrealloc the KASAN_VMALLOC_VM_ALLOC flag is missing when unpoisoning the extended region. This flag is required to correctly associate the allocation with KASAN's vmalloc tracking. Note: In contrast, vzalloc (via __vmalloc_node_range_noprof) explicitly sets KASAN_VMALLOC_VM_ALLOC and calls kasan_unpoison_vmalloc() with it. vrealloc must behave consistently -- especially when reusing existing vmalloc regions -- to ensure KASAN can track allocations correctly. 2. When vrealloc reuses an existing vmalloc region (without allocating new pages) KASAN generates a new tag, which breaks tag-based memory access tracking. Introduce KASAN_VMALLOC_KEEP_TAG, a new KASAN flag that allows reusing the tag already attached to the pointer, ensuring consistent tag behavior during reallocation. Pass KASAN_VMALLOC_KEEP_TAG and KASAN_VMALLOC_VM_ALLOC to the kasan_unpoison_vmalloc inside vrealloc_node_align_noprof(). Link: https://lkml.kernel.org/r/cover.1765978969.git.m.wieczorretman@pm.me Link: https://lkml.kernel.org/r/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me Link: https://lore.kernel.org/all/e7e04692866d02e6d3b32bb43b998e5d17092ba4.1738686764.git.maciej.wieczor-retman@intel.com/ [1] Link: https://lore.kernel.org/all/aMUrW1Znp1GEj7St@MiWiFi-R3L-srv/ [2] Link: https://lore.kernel.org/all/CAPAsAGxDRv_uFeMYu9TwhBVWHCCtkSxoWY4xmFB_vowMbi8raw@mail.gmail.com/ [3] Link: https://syzkaller.appspot.com/bug?extid=997752115a851cb0cf36 [4] Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Jiayuan Chen Co-developed-by: Maciej Wieczor-Retman Signed-off-by: Maciej Wieczor-Retman Reported-by: syzbot+997752115a851cb0cf36@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e243a2.050a0220.1696c6.007d.GAE@google.com/T/ Reviewed-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Danilo Krummrich Cc: Dmitriy Vyukov Cc: Kees Cook Cc: Marco Elver Cc: "Uladzislau Rezki (Sony)" Cc: Vincenzo Frascino Cc: Signed-off-by: Andrew Morton --- include/linux/kasan.h | 1 + mm/kasan/hw_tags.c | 2 +- mm/kasan/shadow.c | 4 +++- mm/vmalloc.c | 4 +++- 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index f335c1d7b61d..df3d8567dde9 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -28,6 +28,7 @@ typedef unsigned int __bitwise kasan_vmalloc_flags_t; #define KASAN_VMALLOC_INIT ((__force kasan_vmalloc_flags_t)0x01u) #define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u) #define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u) +#define KASAN_VMALLOC_KEEP_TAG ((__force kasan_vmalloc_flags_t)0x08u) #define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply exsiting page range */ #define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */ diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c index 1c373cc4b3fa..cbef5e450954 100644 --- a/mm/kasan/hw_tags.c +++ b/mm/kasan/hw_tags.c @@ -361,7 +361,7 @@ void *__kasan_unpoison_vmalloc(const void *start, unsigned long size, return (void *)start; } - tag = kasan_random_tag(); + tag = (flags & KASAN_VMALLOC_KEEP_TAG) ? get_tag(start) : kasan_random_tag(); start = set_tag(start, tag); /* Unpoison and initialize memory up to size. */ diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 29a751a8a08d..32fbdf759ea2 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -631,7 +631,9 @@ void *__kasan_unpoison_vmalloc(const void *start, unsigned long size, !(flags & KASAN_VMALLOC_PROT_NORMAL)) return (void *)start; - start = set_tag(start, kasan_random_tag()); + if (unlikely(!(flags & KASAN_VMALLOC_KEEP_TAG))) + start = set_tag(start, kasan_random_tag()); + kasan_unpoison(start, size, false); return (void *)start; } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ecbac900c35f..94c0a9262a46 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4331,7 +4331,9 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align */ if (size <= alloced_size) { kasan_unpoison_vmalloc(p + old_size, size - old_size, - KASAN_VMALLOC_PROT_NORMAL); + KASAN_VMALLOC_PROT_NORMAL | + KASAN_VMALLOC_VM_ALLOC | + KASAN_VMALLOC_KEEP_TAG); /* * No need to zero memory here, as unused memory will have * already been zeroed at initial allocation time or during From 6f13db031e27e88213381039032a9cc061578ea6 Mon Sep 17 00:00:00 2001 From: Maciej Wieczor-Retman Date: Thu, 4 Dec 2025 19:00:04 +0000 Subject: [PATCH 06/27] kasan: refactor pcpu kasan vmalloc unpoison A KASAN tag mismatch, possibly causing a kernel panic, can be observed on systems with a tag-based KASAN enabled and with multiple NUMA nodes. It was reported on arm64 and reproduced on x86. It can be explained in the following points: 1. There can be more than one virtual memory chunk. 2. Chunk's base address has a tag. 3. The base address points at the first chunk and thus inherits the tag of the first chunk. 4. The subsequent chunks will be accessed with the tag from the first chunk. 5. Thus, the subsequent chunks need to have their tag set to match that of the first chunk. Refactor code by reusing __kasan_unpoison_vmalloc in a new helper in preparation for the actual fix. Link: https://lkml.kernel.org/r/eb61d93b907e262eefcaa130261a08bcb6c5ce51.1764874575.git.m.wieczorretman@pm.me Fixes: 1d96320f8d53 ("kasan, vmalloc: add vmalloc tagging for SW_TAGS") Signed-off-by: Maciej Wieczor-Retman Reviewed-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Danilo Krummrich Cc: Dmitriy Vyukov Cc: Jiayuan Chen Cc: Kees Cook Cc: Marco Elver Cc: "Uladzislau Rezki (Sony)" Cc: Vincenzo Frascino Cc: [6.1+] Signed-off-by: Andrew Morton --- include/linux/kasan.h | 15 +++++++++++++++ mm/kasan/common.c | 17 +++++++++++++++++ mm/vmalloc.c | 4 +--- 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index df3d8567dde9..9c6ac4b62eb9 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -631,6 +631,16 @@ static __always_inline void kasan_poison_vmalloc(const void *start, __kasan_poison_vmalloc(start, size); } +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms, + kasan_vmalloc_flags_t flags); +static __always_inline void +kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms, + kasan_vmalloc_flags_t flags) +{ + if (kasan_enabled()) + __kasan_unpoison_vmap_areas(vms, nr_vms, flags); +} + #else /* CONFIG_KASAN_VMALLOC */ static inline void kasan_populate_early_vm_area_shadow(void *start, @@ -655,6 +665,11 @@ static inline void *kasan_unpoison_vmalloc(const void *start, static inline void kasan_poison_vmalloc(const void *start, unsigned long size) { } +static __always_inline void +kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms, + kasan_vmalloc_flags_t flags) +{ } + #endif /* CONFIG_KASAN_VMALLOC */ #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \ diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 1d27f1bd260b..b2b40c59ce18 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "kasan.h" #include "../slab.h" @@ -575,3 +576,19 @@ bool __kasan_check_byte(const void *address, unsigned long ip) } return true; } + +#ifdef CONFIG_KASAN_VMALLOC +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms, + kasan_vmalloc_flags_t flags) +{ + unsigned long size; + void *addr; + int area; + + for (area = 0 ; area < nr_vms ; area++) { + size = vms[area]->size; + addr = vms[area]->addr; + vms[area]->addr = __kasan_unpoison_vmalloc(addr, size, flags); + } +} +#endif diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 94c0a9262a46..41dd01e8430c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -5027,9 +5027,7 @@ retry: * With hardware tag-based KASAN, marking is skipped for * non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc(). */ - for (area = 0; area < nr_vms; area++) - vms[area]->addr = kasan_unpoison_vmalloc(vms[area]->addr, - vms[area]->size, KASAN_VMALLOC_PROT_NORMAL); + kasan_unpoison_vmap_areas(vms, nr_vms, KASAN_VMALLOC_PROT_NORMAL); kfree(vas); return vms; From 6a0e5b333842cf65d6f4e4f0a2a4386504802515 Mon Sep 17 00:00:00 2001 From: Maciej Wieczor-Retman Date: Thu, 4 Dec 2025 19:00:11 +0000 Subject: [PATCH 07/27] kasan: unpoison vms[area] addresses with a common tag A KASAN tag mismatch, possibly causing a kernel panic, can be observed on systems with a tag-based KASAN enabled and with multiple NUMA nodes. It was reported on arm64 and reproduced on x86. It can be explained in the following points: 1. There can be more than one virtual memory chunk. 2. Chunk's base address has a tag. 3. The base address points at the first chunk and thus inherits the tag of the first chunk. 4. The subsequent chunks will be accessed with the tag from the first chunk. 5. Thus, the subsequent chunks need to have their tag set to match that of the first chunk. Use the new vmalloc flag that disables random tag assignment in __kasan_unpoison_vmalloc() - pass the same random tag to all the vm_structs by tagging the pointers before they go inside __kasan_unpoison_vmalloc(). Assigning a common tag resolves the pcpu chunk address mismatch. [akpm@linux-foundation.org: use WARN_ON_ONCE(), per Andrey] Link: https://lkml.kernel.org/r/CA+fCnZeuGdKSEm11oGT6FS71_vGq1vjq-xY36kxVdFvwmag2ZQ@mail.gmail.com [maciej.wieczor-retman@intel.com: remove unneeded pr_warn()] Link: https://lkml.kernel.org/r/919897daaaa3c982a27762a2ee038769ad033991.1764945396.git.m.wieczorretman@pm.me Link: https://lkml.kernel.org/r/873821114a9f722ffb5d6702b94782e902883fdf.1764874575.git.m.wieczorretman@pm.me Fixes: 1d96320f8d53 ("kasan, vmalloc: add vmalloc tagging for SW_TAGS") Signed-off-by: Maciej Wieczor-Retman Reviewed-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Danilo Krummrich Cc: Dmitriy Vyukov Cc: Jiayuan Chen Cc: Kees Cook Cc: Marco Elver Cc: "Uladzislau Rezki (Sony)" Cc: Vincenzo Frascino Cc: [6.1+] Signed-off-by: Andrew Morton --- mm/kasan/common.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/mm/kasan/common.c b/mm/kasan/common.c index b2b40c59ce18..ed489a14dddf 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -584,11 +584,26 @@ void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms, unsigned long size; void *addr; int area; + u8 tag; - for (area = 0 ; area < nr_vms ; area++) { + /* + * If KASAN_VMALLOC_KEEP_TAG was set at this point, all vms[] pointers + * would be unpoisoned with the KASAN_TAG_KERNEL which would disable + * KASAN checks down the line. + */ + if (WARN_ON_ONCE(flags & KASAN_VMALLOC_KEEP_TAG)) + return; + + size = vms[0]->size; + addr = vms[0]->addr; + vms[0]->addr = __kasan_unpoison_vmalloc(addr, size, flags); + tag = get_tag(vms[0]->addr); + + for (area = 1 ; area < nr_vms ; area++) { size = vms[area]->size; - addr = vms[area]->addr; - vms[area]->addr = __kasan_unpoison_vmalloc(addr, size, flags); + addr = set_tag(vms[area]->addr, tag); + vms[area]->addr = + __kasan_unpoison_vmalloc(addr, size, flags | KASAN_VMALLOC_KEEP_TAG); } } #endif From 6ba776b533ca902631fa106b8a90811b3f40b08d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 14 Dec 2025 12:15:17 -0800 Subject: [PATCH 08/27] mm: leafops.h: correct kernel-doc function param. names Modify the kernel-doc function parameter names to prevent kernel-doc warnings: Warning: include/linux/leafops.h:135 function parameter 'entry' not described in 'leafent_type' Warning: include/linux/leafops.h:540 function parameter 'pte' not described in 'pte_is_uffd_marker' Link: https://lkml.kernel.org/r/20251214201517.2187051-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Reviewed-by: Lorenzo Stoakes Cc: Liam Howlett Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/leafops.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/leafops.h b/include/linux/leafops.h index cfafe7a5e7b1..a9ff94b744f2 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -133,7 +133,7 @@ static inline bool softleaf_is_none(softleaf_t entry) /** * softleaf_type() - Identify the type of leaf entry. - * @enntry: Leaf entry. + * @entry: Leaf entry. * * Returns: the leaf entry type associated with @entry. */ @@ -534,7 +534,7 @@ static inline bool pte_is_uffd_wp_marker(pte_t pte) /** * pte_is_uffd_marker() - Does this PTE entry encode a userfault-specific marker * leaf entry? - * @entry: Leaf entry. + * @pte: PTE entry. * * It's useful to be able to determine which leaf entries encode UFFD-specific * markers so we can handle these correctly. From 7838a4eb8a1d23160bd3f588ea7f2b8f7c00c55b Mon Sep 17 00:00:00 2001 From: Alexander Gordeev Date: Fri, 12 Dec 2025 16:14:57 +0100 Subject: [PATCH 09/27] mm/page_alloc: change all pageblocks migrate type on coalescing When a page is freed it coalesces with a buddy into a higher order page while possible. When the buddy page migrate type differs, it is expected to be updated to match the one of the page being freed. However, only the first pageblock of the buddy page is updated, while the rest of the pageblocks are left unchanged. That causes warnings in later expand() and other code paths (like below), since an inconsistency between migration type of the list containing the page and the page-owned pageblocks migration types is introduced. [ 308.986589] ------------[ cut here ]------------ [ 308.987227] page type is 0, passed migratetype is 1 (nr=256) [ 308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270 [ 308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E) [ 308.987439] Unloaded tainted modules: hmac_s390(E):2 [ 308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G E 6.18.0-gcc-bpf-debug #431 PREEMPT [ 308.987657] Tainted: [E]=UNSIGNED_MODULE [ 308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0) [ 308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270) [ 308.987676] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88 [ 308.987688] 0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300 [ 308.987692] 0000000000000008 0000034998d57290 000002be00000100 0000023e00000008 [ 308.987696] 0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0 [ 308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2 larl %r2,000003499883abd4 00000349976fa5f6: c0e5ffe3f4b5 brasl %r14,0000034997378f60 #00000349976fa5fc: af000000 mc 0,0 >00000349976fa600: a7f4ff4c brc 15,00000349976fa498 00000349976fa604: b9040026 lgr %r2,%r6 00000349976fa608: c0300088317f larl %r3,0000034998800906 00000349976fa60e: c0e5fffdb6e1 brasl %r14,00000349976b13d0 00000349976fa614: af000000 mc 0,0 [ 308.987734] Call Trace: [ 308.987738] [<00000349976fa600>] expand+0x240/0x270 [ 308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270) [ 308.987749] [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940 [ 308.987754] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0 [ 308.987759] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40 [ 308.987763] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0 [ 308.987768] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400 [ 308.987774] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220 [ 308.987781] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0 [ 308.987786] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0 [ 308.987791] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240 [ 308.987799] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210 [ 308.987804] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500 [ 308.987809] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0 [ 308.987813] [<000003499734d70e>] do_exception+0x1de/0x540 [ 308.987822] [<0000034998387390>] __do_pgm_check+0x130/0x220 [ 308.987830] [<000003499839a934>] pgm_check_handler+0x114/0x160 [ 308.987838] 3 locks held by mempig_verify/5224: [ 308.987842] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0 [ 308.987859] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40 [ 308.987871] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940 [ 308.987886] Last Breaking-Event-Address: [ 308.987890] [<0000034997379096>] __warn_printk+0x136/0x140 [ 308.987897] irq event stamp: 52330356 [ 308.987901] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220 [ 308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0 [ 308.987913] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530 [ 308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140 [ 308.987929] ---[ end trace 0000000000000000 ]--- [ 308.987936] ------------[ cut here ]------------ [ 308.987940] page type is 0, passed migratetype is 1 (nr=256) [ 308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0 [ 308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E) [ 308.988070] Unloaded tainted modules: hmac_s390(E):2 [ 308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G W E 6.18.0-gcc-bpf-debug #431 PREEMPT [ 308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE [ 308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0) [ 308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0) [ 308.988118] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88 [ 308.988133] 0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300 [ 308.988139] 0000000000000001 0000000000000008 000002be00000100 000002be803ac000 [ 308.988144] 0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728 [ 308.988153] Krnl Code: 00000349976f9e22: c020008a06d9 larl %r2,000003499883abd4 00000349976f9e28: c0e5ffe3f89c brasl %r14,0000034997378f60 #00000349976f9e2e: af000000 mc 0,0 >00000349976f9e32: a7f4ff4e brc 15,00000349976f9cce 00000349976f9e36: b904002b lgr %r2,%r11 00000349976f9e3a: c030008a06e7 larl %r3,000003499883ac08 00000349976f9e40: c0e5fffdbac8 brasl %r14,00000349976b13d0 00000349976f9e46: af000000 mc 0,0 [ 308.988184] Call Trace: [ 308.988188] [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0 [ 308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0) [ 308.988202] [<00000349976ff946>] rmqueue_bulk+0x706/0x940 [ 308.988208] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0 [ 308.988214] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40 [ 308.988221] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0 [ 308.988227] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400 [ 308.988233] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220 [ 308.988240] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0 [ 308.988247] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0 [ 308.988253] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240 [ 308.988260] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210 [ 308.988267] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500 [ 308.988273] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0 [ 308.988279] [<000003499734d70e>] do_exception+0x1de/0x540 [ 308.988286] [<0000034998387390>] __do_pgm_check+0x130/0x220 [ 308.988293] [<000003499839a934>] pgm_check_handler+0x114/0x160 [ 308.988300] 3 locks held by mempig_verify/5224: [ 308.988305] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0 [ 308.988322] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40 [ 308.988334] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940 [ 308.988346] Last Breaking-Event-Address: [ 308.988350] [<0000034997379096>] __warn_printk+0x136/0x140 [ 308.988356] irq event stamp: 52330356 [ 308.988360] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220 [ 308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0 [ 308.988373] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530 [ 308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140 [ 308.988388] ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com Fixes: e6cf9e1c4cde ("mm: page_alloc: fix up block types when merging compatible blocks") Signed-off-by: Alexander Gordeev Reported-by: Marc Hartmayer Closes: https://lore.kernel.org/linux-mm/87wmalyktd.fsf@linux.ibm.com/ Acked-by: Vlastimil Babka Acked-by: Johannes Weiner Reviewed-by: Wei Yang Cc: Marc Hartmayer Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 822e05f1a964..f6586f165b89 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -914,6 +914,17 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn, NULL) != NULL; } +static void change_pageblock_range(struct page *pageblock_page, + int start_order, int migratetype) +{ + int nr_pageblocks = 1 << (start_order - pageblock_order); + + while (nr_pageblocks--) { + set_pageblock_migratetype(pageblock_page, migratetype); + pageblock_page += pageblock_nr_pages; + } +} + /* * Freeing function for a buddy system allocator. * @@ -1000,7 +1011,7 @@ static inline void __free_one_page(struct page *page, * expand() down the line puts the sub-blocks * on the right freelists. */ - set_pageblock_migratetype(buddy, migratetype); + change_pageblock_range(buddy, order, migratetype); } combined_pfn = buddy_pfn & pfn; @@ -2147,17 +2158,6 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag #endif /* CONFIG_MEMORY_ISOLATION */ -static void change_pageblock_range(struct page *pageblock_page, - int start_order, int migratetype) -{ - int nr_pageblocks = 1 << (start_order - pageblock_order); - - while (nr_pageblocks--) { - set_pageblock_migratetype(pageblock_page, migratetype); - pageblock_page += pageblock_nr_pages; - } -} - static inline bool boost_watermark(struct zone *zone) { unsigned long max_boost; From 612b595e08caffc1276e7b0680a0c95951eba185 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Thu, 4 Dec 2025 11:45:31 +0100 Subject: [PATCH 10/27] MAINTAINERS: update one straggling entry for Bartosz Golaszewski The entry for the Qualcomm bluetooth driver only now got sent upstream and still has my old address. Update it to use the kernel.org one. Link: https://lkml.kernel.org/r/20251204104531.22045-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski Cc: Hans Verkuil Signed-off-by: Andrew Morton --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index b12f1e623971..a914ee5564ac 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21346,7 +21346,7 @@ F: Documentation/devicetree/bindings/net/qcom,bam-dmux.yaml F: drivers/net/wwan/qcom_bam_dmux.c QUALCOMM BLUETOOTH DRIVER -M: Bartosz Golaszewski +M: Bartosz Golaszewski L: linux-arm-msm@vger.kernel.org S: Maintained F: drivers/bluetooth/btqca.[ch] From 02129e623c18ad77ebb85210340f72125ae8a7a1 Mon Sep 17 00:00:00 2001 From: Akinobu Mita Date: Wed, 10 Dec 2025 00:10:34 +0900 Subject: [PATCH 11/27] mm/damon/vaddr: fix missing pte_unmap_unlock in damos_va_migrate_pmd_entry() If the PTE page table lock is acquired by pte_offset_map_lock(), the lock must be released via pte_unmap_unlock(). However, in damos_va_migrate_pmd_entry(), if damos_va_filter_out() returns true, it immediately returns without releasing the lock. This fixes the issue by not stopping page table traversal when damos_va_filter_out() returns true and ensuring that the lock is released. Link: https://lkml.kernel.org/r/20251209151034.77221-1-akinobu.mita@gmail.com Fixes: 09efc56a3b1c ("mm/damon/vaddr: consistently use only pmd_entry for damos_migrate") Signed-off-by: Akinobu Mita Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- mm/damon/vaddr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 2750c88e7225..23ed738a0bd6 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -743,7 +743,7 @@ huge_out: if (!folio) continue; if (damos_va_filter_out(s, folio, walk->vma, addr, pte, NULL)) - return 0; + continue; damos_va_migrate_dests_add(folio, walk->vma, addr, dests, migration_lists); nr = folio_nr_pages(folio); From fdee5216851c2e0f88690c4038eaede3bcd128bc Mon Sep 17 00:00:00 2001 From: WangYuli Date: Mon, 8 Dec 2025 10:57:30 +0800 Subject: [PATCH 12/27] .mailmap: remove one of the entries for WangYuli MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit 01ef0296d2eb (".mailmap: add entry for WangYuli") was merged into mainline, I've received feedback from former colleagues: They believe the change to .mailmap affects git log based statistics, which in turn reduces the reported “contributions from uniontech” in the Linux commit tree, and they think it's difficult to explain to everyone that future statistics must be generated with the --no-use-mailmap option. I don't have a strong opinion either way, but since my commit has caused them trouble, I'm now requesting that this line be removed to bring a little more LOVE AND PEACE to the world :-) Link: https://lkml.kernel.org/r/20251208025730.33881-1-wangyuli@aosc.io Signed-off-by: WangYuli Cc: Carlos Bilbao Cc: Hans Verkuil Cc: Martin Kepplinger Cc: Shannon Nelson Signed-off-by: Andrew Morton --- .mailmap | 1 - 1 file changed, 1 deletion(-) diff --git a/.mailmap b/.mailmap index e6431290e849..7a6110d0e46d 100644 --- a/.mailmap +++ b/.mailmap @@ -858,7 +858,6 @@ Vladimir Davydov Vladimir Davydov WangYuli WangYuli -WangYuli Weiwen Hu WeiXiong Liao Wen Gong From 8de524774b9e79562452730d66e88f525cdd8149 Mon Sep 17 00:00:00 2001 From: Pratyush Yadav Date: Fri, 12 Dec 2025 16:12:02 +0900 Subject: [PATCH 13/27] MAINTAINERS: add ABI headers to KHO and LIVE UPDATE include/linux/kho is supposed to hold KHO headers. Add it to KHO's MAINTAINERS entry so the right people can get patches to it. include/linux/kho/abi contains the live update ABI headers for LUO core and memfd. It will also hold ABI headers for other upcoming file types as well. Add it to live update entry so live update maintainers can get changes for it (currently they happen to be the same people). Link: https://lkml.kernel.org/r/20251212071204.398788-1-pratyush@kernel.org Signed-off-by: Pratyush Yadav Reviewed-by: Pasha Tatashin Cc: Alexander Graf Cc: Mike Rapoport Signed-off-by: Andrew Morton --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index a914ee5564ac..2fa30b32411d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13959,6 +13959,7 @@ S: Maintained F: Documentation/admin-guide/mm/kho.rst F: Documentation/core-api/kho/* F: include/linux/kexec_handover.h +F: include/linux/kho/ F: kernel/liveupdate/kexec_handover* F: lib/test_kho.c F: tools/testing/selftests/kho/ @@ -14637,6 +14638,7 @@ S: Maintained F: Documentation/core-api/liveupdate.rst F: Documentation/mm/memfd_preservation.rst F: Documentation/userspace-api/liveupdate.rst +F: include/linux/kho/abi/ F: include/linux/liveupdate.h F: include/linux/liveupdate/ F: include/uapi/linux/liveupdate.h From fe55ea85939efcbf0e6baa234f0d70acb79e7b58 Mon Sep 17 00:00:00 2001 From: Pingfan Liu Date: Tue, 16 Dec 2025 09:48:51 +0800 Subject: [PATCH 14/27] kernel/kexec: change the prototype of kimage_map_segment() The kexec segment index will be required to extract the corresponding information for that segment in kimage_map_segment(). Additionally, kexec_segment already holds the kexec relocation destination address and size. Therefore, the prototype of kimage_map_segment() can be changed. Link: https://lkml.kernel.org/r/20251216014852.8737-1-piliu@redhat.com Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation") Signed-off-by: Pingfan Liu Acked-by: Baoquan He Cc: Mimi Zohar Cc: Roberto Sassu Cc: Alexander Graf Cc: Steven Chen Cc: Signed-off-by: Andrew Morton --- include/linux/kexec.h | 4 ++-- kernel/kexec_core.c | 9 ++++++--- security/integrity/ima/ima_kexec.c | 4 +--- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ff7e231b0485..8a22bc9b8c6c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -530,7 +530,7 @@ extern bool kexec_file_dbg_print; #define kexec_dprintk(fmt, arg...) \ do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) -extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size); +extern void *kimage_map_segment(struct kimage *image, int idx); extern void kimage_unmap_segment(void *buffer); #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; @@ -540,7 +540,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { } static inline void crash_kexec(struct pt_regs *regs) { } static inline int kexec_should_crash(struct task_struct *p) { return 0; } static inline int kexec_crash_loaded(void) { return 0; } -static inline void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size) +static inline void *kimage_map_segment(struct kimage *image, int idx) { return NULL; } static inline void kimage_unmap_segment(void *buffer) { } #define kexec_in_progress false diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 0f92acdd354d..1a79c5b18d8f 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -953,17 +953,20 @@ int kimage_load_segment(struct kimage *image, int idx) return result; } -void *kimage_map_segment(struct kimage *image, - unsigned long addr, unsigned long size) +void *kimage_map_segment(struct kimage *image, int idx) { + unsigned long addr, size, eaddr; unsigned long src_page_addr, dest_page_addr = 0; - unsigned long eaddr = addr + size; kimage_entry_t *ptr, entry; struct page **src_pages; unsigned int npages; void *vaddr = NULL; int i; + addr = image->segment[idx].mem; + size = image->segment[idx].memsz; + eaddr = addr + size; + /* * Collect the source pages and map them in a contiguous VA range. */ diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 7362f68f2d8b..5beb69edd12f 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -250,9 +250,7 @@ void ima_kexec_post_load(struct kimage *image) if (!image->ima_buffer_addr) return; - ima_kexec_buffer = kimage_map_segment(image, - image->ima_buffer_addr, - image->ima_buffer_size); + ima_kexec_buffer = kimage_map_segment(image, image->ima_segment_index); if (!ima_kexec_buffer) { pr_err("Could not map measurements buffer.\n"); return; From a3785ae5d334bb71d47a593d54c686a03fb9d136 Mon Sep 17 00:00:00 2001 From: Pingfan Liu Date: Tue, 16 Dec 2025 09:48:52 +0800 Subject: [PATCH 15/27] kernel/kexec: fix IMA when allocation happens in CMA area *** Bug description *** When I tested kexec with the latest kernel, I ran into the following warning: [ 40.712410] ------------[ cut here ]------------ [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 [...] [ 40.816047] Call trace: [ 40.818498] kimage_map_segment+0x144/0x198 (P) [ 40.823221] ima_kexec_post_load+0x58/0xc0 [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 [...] [ 40.855423] ---[ end trace 0000000000000000 ]--- *** How to reproduce *** This bug is only triggered when the kexec target address is allocated in the CMA area. If no CMA area is reserved in the kernel, use the "cma=" option in the kernel command line to reserve one. *** Root cause *** The commit 07d24902977e ("kexec: enable CMA based contiguous allocation") allocates the kexec target address directly on the CMA area to avoid copying during the jump. In this case, there is no IND_SOURCE for the kexec segment. But the current implementation of kimage_map_segment() assumes that IND_SOURCE pages exist and map them into a contiguous virtual address by vmap(). *** Solution *** If IMA segment is allocated in the CMA area, use its page_address() directly. Link: https://lkml.kernel.org/r/20251216014852.8737-2-piliu@redhat.com Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation") Signed-off-by: Pingfan Liu Acked-by: Baoquan He Cc: Alexander Graf Cc: Steven Chen Cc: Mimi Zohar Cc: Roberto Sassu Cc: Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 1a79c5b18d8f..95c585c6ddc3 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -960,13 +960,17 @@ void *kimage_map_segment(struct kimage *image, int idx) kimage_entry_t *ptr, entry; struct page **src_pages; unsigned int npages; + struct page *cma; void *vaddr = NULL; int i; + cma = image->segment_cma[idx]; + if (cma) + return page_address(cma); + addr = image->segment[idx].mem; size = image->segment[idx].memsz; eaddr = addr + size; - /* * Collect the source pages and map them in a contiguous VA range. */ @@ -1007,7 +1011,8 @@ void *kimage_map_segment(struct kimage *image, int idx) void kimage_unmap_segment(void *segment_buffer) { - vunmap(segment_buffer); + if (is_vmalloc_addr(segment_buffer)) + vunmap(segment_buffer); } struct kexec_load_limit { From 632b874d59a36caf829ab5790dafb90f9b350fd6 Mon Sep 17 00:00:00 2001 From: Wake Liu Date: Wed, 10 Dec 2025 17:14:08 +0800 Subject: [PATCH 16/27] selftests/mm: fix thread state check in uffd-unit-tests In the thread_state_get() function, the logic to find the thread's state character was using `sizeof(header) - 1` to calculate the offset from the "State:\t" string. The `header` variable is a `const char *` pointer. `sizeof()` on a pointer returns the size of the pointer itself, not the length of the string literal it points to. This makes the code's behavior dependent on the architecture's pointer size. This bug was identified on a 32-bit ARM build (`gsi_tv_arm`) for Android, running on an ARMv8-based device, compiled with Clang 19.0.1. On this 32-bit architecture, `sizeof(char *)` is 4. The expression `sizeof(header) - 1` resulted in an incorrect offset of 3, causing the test to read the wrong character from `/proc/[tid]/status` and fail. On 64-bit architectures, `sizeof(char *)` is 8, so the expression coincidentally evaluates to 7, which matches the length of "State:\t". This is why the bug likely remained hidden on 64-bit builds. To fix this and make the code portable and correct across all architectures, this patch replaces `sizeof(header) - 1` with `strlen(header)`. The `strlen()` function correctly calculates the string's length, ensuring the correct offset is always used. Link: https://lkml.kernel.org/r/20251210091408.3781445-1-wakel@google.com Fixes: f60b6634cd88 ("mm/selftests: add a test to verify mmap_changing race with -EAGAIN") Signed-off-by: Wake Liu Acked-by: Peter Xu Reviewed-by: Mike Rapoport (Microsoft) Cc: Bill Wendling Cc: Justin Stitt Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Nathan Chancellor Cc: Shuah Khan Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- tools/testing/selftests/mm/uffd-unit-tests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c index f4807242c5b2..6f5e404a446c 100644 --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -1317,7 +1317,7 @@ static thread_state thread_state_get(pid_t tid) p = strstr(tmp, header); if (p) { /* For example, "State:\tD (disk sleep)" */ - c = *(p + sizeof(header) - 1); + c = *(p + strlen(header)); return c == 'D' ? THR_STATE_UNINTERRUPTIBLE : THR_STATE_UNKNOWN; } From 7013803444dd3bbbe28fd3360c084cec3057c554 Mon Sep 17 00:00:00 2001 From: Kaushlendra Kumar Date: Tue, 9 Dec 2025 10:15:52 +0530 Subject: [PATCH 17/27] tools/mm/page_owner_sort: fix timestamp comparison for stable sorting The ternary operator in compare_ts() returns 1 when timestamps are equal, causing unstable sorting behavior. Replace with explicit three-way comparison that returns 0 for equal timestamps, ensuring stable qsort ordering and consistent output. Link: https://lkml.kernel.org/r/20251209044552.3396468-1-kaushlendra.kumar@intel.com Fixes: 8f9c447e2e2b ("tools/vm/page_owner_sort.c: support sorting pid and time") Signed-off-by: Kaushlendra Kumar Cc: Chongxi Zhao Cc: Signed-off-by: Andrew Morton --- tools/mm/page_owner_sort.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/mm/page_owner_sort.c b/tools/mm/page_owner_sort.c index 14c67e9e84c4..e6954909401c 100644 --- a/tools/mm/page_owner_sort.c +++ b/tools/mm/page_owner_sort.c @@ -181,7 +181,11 @@ static int compare_ts(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; - return l1->ts_nsec < l2->ts_nsec ? -1 : 1; + if (l1->ts_nsec < l2->ts_nsec) + return -1; + if (l1->ts_nsec > l2->ts_nsec) + return 1; + return 0; } static int compare_cull_condition(const void *p1, const void *p2) From e6dbcb7c0e7b508d443a9aa6f77f63a2f83b1ae4 Mon Sep 17 00:00:00 2001 From: Ankit Agrawal Date: Thu, 11 Dec 2025 07:06:01 +0000 Subject: [PATCH 18/27] mm: fixup pfnmap memory failure handling to use pgoff The memory failure handling implementation for the PFNMAP memory with no struct pages is faulty. The VA of the mapping is determined based on the the PFN. It should instead be based on the file mapping offset. At the occurrence of poison, the memory_failure_pfn is triggered on the poisoned PFN. Introduce a callback function that allows mm to translate the PFN to the corresponding file page offset. The kernel module using the registration API must implement the callback function and provide the translation. The translated value is then used to determine the VA information and sending the SIGBUS to the usermode process mapped to the poisoned PFN. The callback is also useful for the driver to be notified of the poisoned PFN, which may then track it. Link: https://lkml.kernel.org/r/20251211070603.338701-2-ankita@nvidia.com Fixes: 2ec41967189c ("mm: handle poisoning of pfn without struct pages") Signed-off-by: Ankit Agrawal Suggested-by: Jason Gunthorpe Cc: Kevin Tian Cc: Matthew R. Ochs Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Neo Jia Cc: Vikram Sethi Cc: Yishai Hadas Cc: Zhi Wang Signed-off-by: Andrew Morton --- include/linux/memory-failure.h | 2 ++ mm/memory-failure.c | 29 ++++++++++++++++++----------- 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/include/linux/memory-failure.h b/include/linux/memory-failure.h index bc326503d2d2..7b5e11cf905f 100644 --- a/include/linux/memory-failure.h +++ b/include/linux/memory-failure.h @@ -9,6 +9,8 @@ struct pfn_address_space; struct pfn_address_space { struct interval_tree_node node; struct address_space *mapping; + int (*pfn_to_vma_pgoff)(struct vm_area_struct *vma, + unsigned long pfn, pgoff_t *pgoff); }; int register_pfn_address_space(struct pfn_address_space *pfn_space); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fbc5a01260c8..c80c2907da33 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2161,6 +2161,9 @@ int register_pfn_address_space(struct pfn_address_space *pfn_space) { guard(mutex)(&pfn_space_lock); + if (!pfn_space->pfn_to_vma_pgoff) + return -EINVAL; + if (interval_tree_iter_first(&pfn_space_itree, pfn_space->node.start, pfn_space->node.last)) @@ -2183,10 +2186,10 @@ void unregister_pfn_address_space(struct pfn_address_space *pfn_space) } EXPORT_SYMBOL_GPL(unregister_pfn_address_space); -static void add_to_kill_pfn(struct task_struct *tsk, - struct vm_area_struct *vma, - struct list_head *to_kill, - unsigned long pfn) +static void add_to_kill_pgoff(struct task_struct *tsk, + struct vm_area_struct *vma, + struct list_head *to_kill, + pgoff_t pgoff) { struct to_kill *tk; @@ -2197,12 +2200,12 @@ static void add_to_kill_pfn(struct task_struct *tsk, } /* Check for pgoff not backed by struct page */ - tk->addr = vma_address(vma, pfn, 1); + tk->addr = vma_address(vma, pgoff, 1); tk->size_shift = PAGE_SHIFT; if (tk->addr == -EFAULT) pr_info("Unable to find address %lx in %s\n", - pfn, tsk->comm); + pgoff, tsk->comm); get_task_struct(tsk); tk->tsk = tsk; @@ -2212,11 +2215,12 @@ static void add_to_kill_pfn(struct task_struct *tsk, /* * Collect processes when the error hit a PFN not backed by struct page. */ -static void collect_procs_pfn(struct address_space *mapping, +static void collect_procs_pfn(struct pfn_address_space *pfn_space, unsigned long pfn, struct list_head *to_kill) { struct vm_area_struct *vma; struct task_struct *tsk; + struct address_space *mapping = pfn_space->mapping; i_mmap_lock_read(mapping); rcu_read_lock(); @@ -2226,9 +2230,12 @@ static void collect_procs_pfn(struct address_space *mapping, t = task_early_kill(tsk, true); if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, pfn, pfn) { - if (vma->vm_mm == t->mm) - add_to_kill_pfn(t, vma, to_kill, pfn); + vma_interval_tree_foreach(vma, &mapping->i_mmap, 0, ULONG_MAX) { + pgoff_t pgoff; + + if (vma->vm_mm == t->mm && + !pfn_space->pfn_to_vma_pgoff(vma, pfn, &pgoff)) + add_to_kill_pgoff(t, vma, to_kill, pgoff); } } rcu_read_unlock(); @@ -2264,7 +2271,7 @@ static int memory_failure_pfn(unsigned long pfn, int flags) struct pfn_address_space *pfn_space = container_of(node, struct pfn_address_space, node); - collect_procs_pfn(pfn_space->mapping, pfn, &tokill); + collect_procs_pfn(pfn_space, pfn, &tokill); mf_handled = true; } From 6db12d5c474d77016ca9130eb32490c9771fb157 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Tue, 16 Dec 2025 13:20:54 -0800 Subject: [PATCH 19/27] mm: memcg: fix unit conversion for K() macro in OOM log The commit bc8e51c05ad5 ("mm: memcg: dump memcg protection info on oom or alloc failures") added functionality to dump memcg protections on OOM or allocation failures. It uses K() macro to dump the information and passes bytes to the macro. However the macro take number of pages instead of bytes. It is defined as: #define K(x) ((x) << (PAGE_SHIFT-10)) Let's fix this. Link: https://lkml.kernel.org/r/20251216212054.484079-1-shakeel.butt@linux.dev Fixes: bc8e51c05ad5 ("mm: memcg: dump memcg protection info on oom or alloc failures") Signed-off-by: Shakeel Butt Reported-by: Chris Mason Acked-by: Michal Hocko Acked-by: Vlastimil Babka Reviewed-by: Muchun Song Cc: Johannes Weiner Cc: Roman Gushchin Signed-off-by: Andrew Morton --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index be810c1fbfc3..86f43b7e5f71 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5638,6 +5638,6 @@ void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg) memcg = root_mem_cgroup; pr_warn("Memory cgroup min protection %lukB -- low protection %lukB", - K(atomic_long_read(&memcg->memory.children_min_usage)*PAGE_SIZE), - K(atomic_long_read(&memcg->memory.children_low_usage)*PAGE_SIZE)); + K(atomic_long_read(&memcg->memory.children_min_usage)), + K(atomic_long_read(&memcg->memory.children_low_usage))); } From 6558749ef3405c143711cbdc67ec88cbc1582d91 Mon Sep 17 00:00:00 2001 From: Alice Ryhl Date: Wed, 17 Dec 2025 13:10:37 +0000 Subject: [PATCH 20/27] rust: maple_tree: rcu_read_lock() in destructor to silence lockdep MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When running the Rust maple tree kunit tests with lockdep, you may trigger a warning that looks like this: lib/maple_tree.c:780 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by kunit_try_catch/344. stack backtrace: CPU: 3 UID: 0 PID: 344 Comm: kunit_try_catch Tainted: G N 6.19.0-rc1+ #2 NONE Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x71/0x90 lockdep_rcu_suspicious+0x150/0x190 mas_start+0x104/0x150 mas_find+0x179/0x240 _RINvNtCs5QSdWC790r4_4core3ptr13drop_in_placeINtNtCs1cdwasc6FUb_6kernel10maple_tree9MapleTreeINtNtNtBL_5alloc4kbox3BoxlNtNtB1x_9allocator7KmallocEEECsgxAQYCfdR72_25doctests_kernel_generated+0xaf/0x130 rust_doctest_kernel_maple_tree_rs_0+0x600/0x6b0 ? lock_release+0xeb/0x2a0 ? kunit_try_catch_run+0x210/0x210 kunit_try_run_case+0x74/0x160 ? kunit_try_catch_run+0x210/0x210 kunit_generic_run_threadfn_adapter+0x12/0x30 kthread+0x21c/0x230 ? __do_trace_sched_kthread_stop_ret+0x40/0x40 ret_from_fork+0x16c/0x270 ? __do_trace_sched_kthread_stop_ret+0x40/0x40 ret_from_fork_asm+0x11/0x20 This is because the destructor of maple tree calls mas_find() without taking rcu_read_lock() or the spinlock. Doing that is actually ok in this case since the destructor has exclusive access to the entire maple tree, but it triggers a lockdep warning. To fix that, take the rcu read lock. In the future, it's possible that memory reclaim could gain a feature where it reallocates entries in maple trees even if no user-code is touching it. If that feature is added, then this use of rcu read lock would become load-bearing, so I did not make it conditional on lockdep. We have to repeatedly take and release rcu because the destructor of T might perform operations that sleep. Link: https://lkml.kernel.org/r/20251217-maple-drop-rcu-v1-1-702af063573f@google.com Fixes: da939ef4c494 ("rust: maple_tree: add MapleTree") Signed-off-by: Alice Ryhl Reported-by: Andreas Hindborg Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/564215108 Reviewed-by: Gary Guo Reviewed-by: Daniel Almeida Cc: Andrew Ballance Cc: Björn Roy Baron Cc: Boqun Feng Cc: Danilo Krummrich Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Cc: Miguel Ojeda Cc: Trevor Gross Cc: Signed-off-by: Andrew Morton --- rust/kernel/maple_tree.rs | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/rust/kernel/maple_tree.rs b/rust/kernel/maple_tree.rs index e72eec56bf57..265d6396a78a 100644 --- a/rust/kernel/maple_tree.rs +++ b/rust/kernel/maple_tree.rs @@ -265,7 +265,16 @@ impl MapleTree { loop { // This uses the raw accessor because we're destroying pointers without removing them // from the maple tree, which is only valid because this is the destructor. - let ptr = ma_state.mas_find_raw(usize::MAX); + // + // Take the rcu lock because mas_find_raw() requires that you hold either the spinlock + // or the rcu read lock. This is only really required if memory reclaim might + // reallocate entries in the tree, as we otherwise have exclusive access. That feature + // doesn't exist yet, so for now, taking the rcu lock only serves the purpose of + // silencing lockdep. + let ptr = { + let _rcu = kernel::sync::rcu::Guard::new(); + ma_state.mas_find_raw(usize::MAX) + }; if ptr.is_null() { break; } From f183663901f21fe0fba8bd31ae894bc529709ee0 Mon Sep 17 00:00:00 2001 From: Bijan Tabatabai Date: Tue, 16 Dec 2025 14:07:27 -0600 Subject: [PATCH 21/27] mm: consider non-anon swap cache folios in folio_expected_ref_count() Currently, folio_expected_ref_count() only adds references for the swap cache if the folio is anonymous. However, according to the comment above the definition of PG_swapcache in enum pageflags, shmem folios can also have PG_swapcache set. This patch makes sure references for the swap cache are added if folio_test_swapcache(folio) is true. This issue was found when trying to hot-unplug memory in a QEMU/KVM virtual machine. When initiating hot-unplug when most of the guest memory is allocated, hot-unplug hangs partway through removal due to migration failures. The following message would be printed several times, and would be printed again about every five seconds: [ 49.641309] migrating pfn b12f25 failed ret:7 [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25 [ 49.641311] aops:swap_aops [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3) [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000 [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000 [ 49.641315] page dumped because: migration failure When debugging this, I found that these migration failures were due to __migrate_folio() returning -EAGAIN for a small set of folios because the expected reference count it calculates via folio_expected_ref_count() is one less than the actual reference count of the folios. Furthermore, all of the affected folios were not anonymous, but had the PG_swapcache flag set, inspiring this patch. After applying this patch, the memory hot-unplug behaves as expected. I tested this on a machine running Ubuntu 24.04 with kernel version 6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the mm-unstable branch as a Dec 16, 2025 was also tested and behaves the same) and 48GB of memory. The libvirt XML definition for the VM can be found at [1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in the guest kernel so the hot-pluggable memory is automatically onlined. Below are the steps to reproduce this behavior: 1) Define and start and virtual machine host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1] host$ virsh -c qemu:///system start test_vm 2) Setup swap in the guest guest$ sudo fallocate -l 32G /swapfile guest$ sudo chmod 0600 /swapfile guest$ sudo mkswap /swapfile guest$ sudo swapon /swapfile 3) Use alloc_data [2] to allocate most of the remaining guest memory guest$ ./alloc_data 45 4) In a separate guest terminal, monitor the amount of used memory guest$ watch -n1 free -h 5) When alloc_data has finished allocating, initiate the memory hot-unplug using the provided xml file [3] host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live After initiating the memory hot-unplug, you should see the amount of available memory in the guest decrease, and the amount of used swap data increase. If everything works as expected, when all of the memory is unplugged, there should be around 8.5-9GB of data in swap. If the unplugging is unsuccessful, the amount of used swap data will settle below that. If that happens, you should be able to see log messages in dmesg similar to the one posted above. Link: https://lkml.kernel.org/r/20251216200727.2360228-1-bijan311@gmail.com Link: https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml [1] Link: https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c [2] Link: https://github.com/BijanT/linux_patch_files/blob/main/remove.xml [3] Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation") Signed-off-by: Bijan Tabatabai Acked-by: David Hildenbrand (Red Hat) Acked-by: Zi Yan Reviewed-by: Baolin Wang Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Shivank Garg Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Kairui Song Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 15076261d0c2..6f959d8ca4b4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio) if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio))) return 0; - if (folio_test_anon(folio)) { - /* One reference per page from the swapcache. */ - ref_count += folio_test_swapcache(folio) << order; - } else { + /* One reference per page from the swapcache. */ + ref_count += folio_test_swapcache(folio) << order; + + if (!folio_test_anon(folio)) { /* One reference per page from the pagecache. */ ref_count += !!folio->mapping << order; /* One reference from PG_private. */ From 0c75714095e06692f7a0e00a3dfd829c0d3c0ada Mon Sep 17 00:00:00 2001 From: Joshua Hahn Date: Thu, 18 Dec 2025 00:31:59 -0800 Subject: [PATCH 22/27] mm/page_alloc: report 1 as zone_batchsize for !CONFIG_MMU Commit 2783088ef24e ("mm/page_alloc: prevent reporting pcp->batch = 0") moved the error handling (0-handling) of zone_batchsize from its callers to inside the function. However, the commit left out the error handling for the NOMMU case, leading to deadlocks on NOMMU systems. For NOMMU systems, return 1 instead of 0 for zone_batchsize, which restores the previous deadlock-free behavior. There is no functional difference expected with this patch before commit 2783088ef24e, other than the pr_debug in zone_pcp_init now printing out 1 instead of 0 for zones in NOMMU systems. Not only is this a pr_debug, the difference is purely semantic anyways. Link: https://lkml.kernel.org/r/20251218083200.2435789-1-joshua.hahnjy@gmail.com Fixes: 2783088ef24e ("mm/page_alloc: prevent reporting pcp->batch = 0") Signed-off-by: Joshua Hahn Reported-by: Daniel Palmer Closes: https://lore.kernel.org/linux-mm/CAFr9PX=_HaM3_xPtTiBn5Gw5-0xcRpawpJ02NStfdr0khF2k7g@mail.gmail.com/ Reported-by: Guenter Roeck Closes: https://lore.kernel.org/all/42143500-c380-41fe-815c-696c17241506@roeck-us.net/ Reviewed-by: Vlastimil Babka Tested-by: Daniel Palmer Tested-by: Guenter Roeck Acked-by: SeongJae Park Tested-by: Hajime Tazaki Cc: Brendan Jackman Cc: Johannes Weiner Cc: Michal Hocko Cc: Suren Baghdasaryan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f6586f165b89..c380f063e8b7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5924,7 +5924,7 @@ static int zone_batchsize(struct zone *zone) * recycled, this leads to the once large chunks of space being * fragmented and becoming unavailable for high-order allocations. */ - return 0; + return 1; #endif } From 7622292d2a4c4de36144a30b12a0d0f70d35f2c1 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 17 Dec 2025 22:09:21 -0800 Subject: [PATCH 23/27] sparse: update MAINTAINERS info Chris Li is back as sparse maintainer. See https://git.kernel.org/pub/scm/devel/sparse/sparse.git/commit/?id=67f0a03cee4637e495151c48a02be642a158cbbb Link: https://lkml.kernel.org/r/20251218060921.995516-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Cc: Christopher Li Signed-off-by: Andrew Morton --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2fa30b32411d..f81baa538916 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24574,7 +24574,7 @@ F: drivers/tty/vcc.c F: include/linux/sunserialcore.h SPARSE CHECKER -M: "Luc Van Oostenryck" +M: Chris Li L: linux-sparse@vger.kernel.org S: Maintained W: https://sparse.docs.kernel.org/ From ffd042a23b798dfe2786c998038c1bf53ae818ef Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 19 Dec 2025 16:03:27 -0800 Subject: [PATCH 24/27] MAINTAINERS: notify the "Device Memory" community of memory hotplug changes The recent episode of a warning regression in memremap_pages() [1] highlights that relevant updates are being missed by folks that care about core ZONE_DEVICE changes. Yes, CXL folks should pay more attention to linux-mm@, but it also would not hurt to copy linux-cxl@, where most Device Memory folks hang out, on memory hotplug changes by default. Link: http://lore.kernel.org/20251219123717.39330-1-john@groves.net [1] Link: https://lkml.kernel.org/r/20251220000327.3502994-1-dan.j.williams@intel.com Signed-off-by: Dan Williams Acked-by: Jonathan Cameron Acked-by: John Groves Cc: David Hildenbrand Cc: Oscar Salvador Cc: Davidlohr Bueso Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Signed-off-by: Andrew Morton --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index f81baa538916..04bdb732181b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16428,6 +16428,7 @@ MEMORY HOT(UN)PLUG M: David Hildenbrand M: Oscar Salvador L: linux-mm@kvack.org +L: linux-cxl@vger.kernel.org S: Maintained F: Documentation/admin-guide/mm/memory-hotplug.rst F: Documentation/core-api/memory-hotplug.rst From 077d925b60c320027dd64b69e0ab2dd2e00ed45c Mon Sep 17 00:00:00 2001 From: John Groves Date: Fri, 19 Dec 2025 06:37:17 -0600 Subject: [PATCH 25/27] mm/memremap: fix spurious large folio warning for FS-DAX This patch addresses a warning that I discovered while working on famfs, which is an fs-dax file system that virtually always does PMD faults (next famfs patch series coming after the holidays). However, XFS also does PMD faults in fs-dax mode, and it also triggers the warning. It takes some effort to get XFS to do a PMD fault, but instructions to reproduce it are below. The VM_WARN_ON_ONCE(folio_test_large(folio)) check in free_zone_device_folio() incorrectly triggers for MEMORY_DEVICE_FS_DAX when PMD (2MB) mappings are used. FS-DAX legitimately creates large file-backed folios when handling PMD faults. This is a core feature of FS-DAX that provides significant performance benefits by mapping 2MB regions directly to persistent memory. When these mappings are unmapped, the large folios are freed through free_zone_device_folio(), which triggers the spurious warning. The warning was introduced by commit that added support for large zone device private folios. However, that commit did not account for FS-DAX file-backed folios, which have always supported large (PMD-sized) mappings. The check distinguishes between anonymous folios (which clear AnonExclusive flags for each sub-page) and file-backed folios. For file-backed folios, it assumes large folios are unexpected - but this assumption is incorrect for FS-DAX. The fix is to exempt MEMORY_DEVICE_FS_DAX from the large folio warning, allowing FS-DAX to continue using PMD mappings without triggering false warnings. Link: https://lkml.kernel.org/r/20251219123717.39330-1-john@groves.net Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") Signed-off-by: John Groves Acked-by: David Hildenbrand (Red Hat) Reviewed-by: Dan Williams Tested-by: Alison Schofield Cc: Alistair Popple Cc: Balbir Singh Cc: "Darrick J. Wong" Cc: Gregory Price Cc: Oscar Salvador Signed-off-by: Andrew Morton --- mm/memremap.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/memremap.c b/mm/memremap.c index 4c2e0d68eb27..63c6ab4fdf08 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -427,8 +427,6 @@ void free_zone_device_folio(struct folio *folio) if (folio_test_anon(folio)) { for (i = 0; i < nr; i++) __ClearPageAnonExclusive(folio_page(folio, i)); - } else { - VM_WARN_ON_ONCE(folio_test_large(folio)); } /* From a76a5ae2c6c645005672c2caf2d49361c6f2500f Mon Sep 17 00:00:00 2001 From: Ran Xiaokai Date: Fri, 19 Dec 2025 07:42:32 +0000 Subject: [PATCH 26/27] mm/page_owner: fix memory leak in page_owner_stack_fops->release() The page_owner_stack_fops->open() callback invokes seq_open_private(), therefore its corresponding ->release() callback must call seq_release_private(). Otherwise it will cause a memory leak of struct stack_print_ctx. Link: https://lkml.kernel.org/r/20251219074232.136482-1-ranxiaokai627@163.com Fixes: 765973a09803 ("mm,page_owner: display all stacks and their count") Signed-off-by: Ran Xiaokai Acked-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Andrey Konovalov Cc: Brendan Jackman Cc: Johannes Weiner Cc: Marco Elver Cc: Suren Baghdasaryan Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- mm/page_owner.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index a70245684206..b3260f0c17ba 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -952,7 +952,7 @@ static const struct file_operations page_owner_stack_fops = { .open = page_owner_stack_open, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = seq_release_private, }; static int page_owner_threshold_get(void *data, u64 *val) From d6b5a8d6f142ad0a8e45181f06e70b4746c4abc3 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Sat, 20 Dec 2025 15:29:26 -0500 Subject: [PATCH 27/27] mm/ksm: fix pte_unmap_unlock of wrong address in break_ksm_pmd_entry On ARM32 with HIGHMEM/HIGHPTE, break_ksm_pmd_entry() triggers a BUG during KSM unmerging because pte_unmap_unlock() is passed a pointer that may be beyond the mapped PTE page. The issue occurs when the PTE iteration loop completes without finding a KSM page. After the loop, 'ptep' has been incremented past the last PTE entry. On ARM32 LPAE with 512 PTEs per page (512 * 8 = 4096 bytes), this means ptep points to the next page, outside the kmap'd region. When pte_unmap_unlock(ptep, ptl) calls kunmap_local(ptep), it unmaps the wrong page address, leaving the original kmap slot still mapped. The next kmap_local then finds this slot unexpectedly occupied: WARNING: mm/highmem.c:622 kunmap_local_indexed (address mismatch) kernel BUG at mm/highmem.c:564 __kmap_local_pfn_prot (slot not empty) Fix this by passing start_ptep to pte_unmap_unlock(), which always points within the originally mapped PTE page. Reproducer: Run LTP ksm03 test on ARM32 with HIGHMEM enabled. The test triggers KSM merging followed by unmerging (writing 0 then 2 to /sys/kernel/mm/ksm/run), which exercises break_ksm_pmd_entry(). Link: https://lkml.kernel.org/r/20251220202926.318366-1-sashal@kernel.org Fixes: 5d4939fc2258 ("ksm: perform a range-walk in break_ksm") Signed-off-by: Sasha Levin Assisted-by: claude-opus-4-5-20251101 Acked-by: David Hildenbrand (Red Hat) Reviewed-by: Chengming Zhou Cc: Pedro Demarchi Gomes Cc: xu xin Signed-off-by: Andrew Morton --- mm/ksm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index cfc182255c7b..2d89a7c8b4eb 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -650,7 +650,7 @@ static int break_ksm_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long en } } out_unlock: - pte_unmap_unlock(ptep, ptl); + pte_unmap_unlock(start_ptep, ptl); return found; }