mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-01-17 03:50:37 +00:00
6161 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
24172e0d79 |
arm64 fixes for -rc6
- Avoid sleeping in atomic context when changing linear map permissions
for DEBUG_PAGEALLOC or KFENCE.
- Rework printing of Spectre mitigation status to avoid hardlockup when
enabling per-task mitigations on the context-switch path.
- Reject kernel modules when instruction patching fails either due to
the DWARF-based SCS patching or because of an alternatives callback
residing outside of the core kernel text.
- Propagate error when updating kernel memory permissions in kprobes.
- Drop pointless, incorrect message when enabling the ACPI SPCR console.
- Use value-returning LSE instructions for per-cpu atomics to reduce
latency in SRCU locking routines.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmkSAqQQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNHIpB/0SubZVLevJmInp4nu93ghHwu/8UhYG1Jpg
P4DayUJ0Ghnox6PMSNci4s2+RSQc8NbdF2I4kcJNa8v8kMt9sXDL87614nZXDVtK
FaEMK4PnnV3iFcQUr58kKWEf8cowG7gIi9Lq61InADAbZhQCDi/KAnlr5ydjF8hT
Ixo9PwIDOlWiBi6IwJRt1yWsswtNFOcDhor3boFL+e19jjbwmgCXqejbwb74KtK7
C5xcECzC8uHOuukn3Q0cZbKqpc+x9Nc98FnA44n9Ht+eoi/svEAYVeJuk1PJNGnA
viAv0DJ8QUP2OLYrMmuOPReg5+n/RL7i9rNXJvcBQDmRUOVmMPQh
=kjVk
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's more here than I would ideally like at this stage, but there's
been a steady trickle of fixes and some of them took a few rounds of
review.
The bulk of the changes are fixing some fallout from the recent BBM
level two support which allows the linear map to be split from block
to page mappings at runtime, but inadvertently led to sleeping in
atomic context on some paths where the linear map was already mapped
with page granularity. The fix is simply to avoid splitting in those
cases but the implementation of that is a little involved.
The other interesting fix is addressing a catastophic performance
issue with our per-cpu atomics discovered by Paul in the SRCU locking
code but which took some interactions with the hardware folks to
resolve.
Summary:
- Avoid sleeping in atomic context when changing linear map
permissions for DEBUG_PAGEALLOC or KFENCE
- Rework printing of Spectre mitigation status to avoid hardlockup
when enabling per-task mitigations on the context-switch path
- Reject kernel modules when instruction patching fails either due to
the DWARF-based SCS patching or because of an alternatives callback
residing outside of the core kernel text
- Propagate error when updating kernel memory permissions in kprobes
- Drop pointless, incorrect message when enabling the ACPI SPCR
console
- Use value-returning LSE instructions for per-cpu atomics to reduce
latency in SRCU locking routines"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Reject modules with internal alternative callbacks
arm64: Fail module loading if dynamic SCS patching fails
arm64: proton-pack: Fix hard lockup due to print in scheduler context
arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
arm64: mm: Tidy up force_pte_mapping()
arm64: mm: Optimize range_split_to_ptes()
arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
arm64: kprobes: check the return value of set_memory_rox()
arm64: acpi: Drop message logging SPCR default console
Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent"
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
|
||
|
|
8e8ae78896 |
arm64: Reject modules with internal alternative callbacks
During module loading, check if a callback function used by the alternatives specified in the '.altinstruction' ELF section (if present) is located in core kernel .text. If not fail module loading before callback is called. Reported-by: Fanqin Cui <cuifq1@chinatelecom.cn> Closes: https://lore.kernel.org/all/20250807072700.348514-1-fanqincui@163.com/ Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> [will: Folded in 'noinstr' tweak from Mark] Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
6d4a0fbd34 |
arm64: Fail module loading if dynamic SCS patching fails
Disallow a module to load if SCS dynamic patching fails for its code. For module loading, instead of running a dry-run to check for patching errors, try to run patching in the first run and propagate any errors so module loading will fail. Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
7f16357378 |
arm64: proton-pack: Fix hard lockup due to print in scheduler context
Relocate the printk() calls from spectre_v4_mitigations_off() and spectre_v2_mitigations_off() into setup_system_capabilities() function, preventing hard lockups caused by printk calls in scheduler context: | _raw_spin_lock_nested+168 | ttwu_queue+180 (rq_lock(rq, &rf); 2nd acquiring the rq->__lock) | try_to_wake_up+548 | wake_up_process+32 | __up+88 | up+100 | __up_console_sem+96 | console_unlock+696 | vprintk_emit+428 | vprintk_default+64 | vprintk_func+220 | printk+104 | spectre_v4_enable_task_mitigation+344 | __switch_to+100 | __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock) | schedule_idle+48 | do_idle+388 | cpu_startup_entry+44 | secondary_start_kernel+352 Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: shechenglong <shechenglong@xfusion.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
ce2b3a50ad |
arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
It has been reported that split_kernel_leaf_mapping() is trying to sleep in non-sleepable context. It does this when acquiring the pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE are enabled, which change linear map permissions within softirq context during memory allocation and/or freeing. All other paths into this function are called from sleepable context and so are safe. But it turns out that the memory for which these 2 features may attempt to modify the permissions is always mapped by pte, so there is no need to attempt to split the mapping. So let's exit early in these cases and avoid attempting to take the mutex. There is one wrinkle to this approach; late-initialized kfence allocates it's pool from the buddy which may be block mapped. So we must hook that allocation and convert it to pte-mappings up front. Previously this was done as a side-effect of kfence protecting all the individual pages in its pool at init-time, but this no longer works due to the added early exit path in split_kernel_leaf_mapping(). So instead, do this via the existing arch_kfence_init_pool() arch hook, and reuse the existing linear_map_split_to_ptes() infrastructure. Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/ Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <groeck@google.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
535fdfc5a2 |
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
The non-return per-CPU this_cpu_*() atomic operations are implemented as
STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture
implementations, these instructions tend to be executed "far" in the
interconnect or memory subsystem (unless the data is already in the L1
cache). This is in general more efficient when there is contention as it
avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD
without XZR as destination), OTOH, tend to be executed "near" with the
data loaded into the L1 cache.
STADD executed back to back as in srcu_read_{lock,unlock}*() incur an
additional overhead due to the default posting behaviour on several CPU
implementations. Since the per-CPU atomics are unlikely to be used
concurrently on the same memory location, encourage the hardware to to
execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with
the destination register unused (but not XZR).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
[will: Add comment and link to the discussion thread]
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
266ee584e5 |
arm64 fixes:
- Do not make a clean PTE dirty in pte_mkwrite()
The Arm architecture, for backwards compatibility reasons (ARMv8.0
before in-hardware dirty bit management - DBM), uses the PTE_RDONLY
bit to mean !dirty while the PTE_WRITE bit means DBM enabled. The
arm64 pte_mkwrite() simply clears the PTE_RDONLY bit and this
inadvertently makes the PTE pte_hw_dirty(). Most places making a PTE
writable also invoke pte_mkdirty() but do_swap_page() does not and we
end up with dirty, freshly swapped in, writeable pages.
- Do not warn if the destination page is already MTE-tagged in
copy_highpage()
In the majority of the cases, a destination page copied into is
freshly allocated without the PG_mte_tagged flag set. However, the
folio migration may be restarted if __folio_migrate_mapping() failed,
triggering the benign WARN_ON_ONCE().
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmj6bJwACgkQa9axLQDI
XvF/Kg//TumWHlM3o07ol2yhK+EXUTMW8DNwzGwKAqPKl19iElZ+F3yVSfgXOxAi
hjZk9qIPGNokKulKX0drJahuhaXVW30ynh9EX6xPhXN6+faqCv8JR7zSABvMXm9E
FYmqEz678RNkIrfSn0uJw6kF3Dakn6IrZbarD2WRWg9kcwQIZCHfdxbiSbuBKQ+m
oXk5pjndTjuKujMWAhZbmuWGPNc1EfzulwhgpARgQNUxtvToLR/aS+TLykGKXPTL
GieHWR3/AiN1Jhwvsp4ElxLvnM1cau/QhnMX9kCAUjwRfnPBUa+jt/OPziO7s9Gh
NeImrwdtSBi/B5a7GaozZwsyeTAahrmSpaTjzQOrw3jr+Scn7seXtRJMFwYBiXNv
7GGDKsNXc6qmmrdC1Epqvzk6EDFpv2SPkeXIUJVxxaM7qpkhTgxnScqZzaYYmTu+
0CPKMmVtRfHr4GJqkhFcfHQMEHtxpOsA803t4w5iIgVQQQciEcHM7v21h5PWBIqO
vZydWITX/cGk2rwXtfZ4EATG1fl8jt1gwUD6FSUz73gBueLD1DHBhS3AkRCptdeR
AraXxdxdUZAH/7k1k0WOzBCoVBmbT8yjyTdKLH7KlHVaHIQDp+dmcR9mj1VoF5VP
sQXl9pG/WqrIz7pBBWEW83eeSlDK0mD7R9n8k9yHpsEy/ditFXI=
=3tTN
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Do not make a clean PTE dirty in pte_mkwrite()
The Arm architecture, for backwards compatibility reasons (ARMv8.0
before in-hardware dirty bit management - DBM), uses the PTE_RDONLY
bit to mean !dirty while the PTE_WRITE bit means DBM enabled. The
arm64 pte_mkwrite() simply clears the PTE_RDONLY bit and this
inadvertently makes the PTE pte_hw_dirty(). Most places making a PTE
writable also invoke pte_mkdirty() but do_swap_page() does not and we
end up with dirty, freshly swapped in, writeable pages.
- Do not warn if the destination page is already MTE-tagged in
copy_highpage()
In the majority of the cases, a destination page copied into is
freshly allocated without the PG_mte_tagged flag set. However, the
folio migration may be restarted if __folio_migrate_mapping() failed,
triggering the benign WARN_ON_ONCE().
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Do not warn if the page is already tagged in copy_highpage()
arm64, mm: avoid always making PTE dirty in pte_mkwrite()
|
||
|
|
143937ca51 |
arm64, mm: avoid always making PTE dirty in pte_mkwrite()
Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may
mark some pages that are never written dirty wrongly. For example,
do_swap_page() may map the exclusive pages with writable and clean PTEs
if the VMA is writable and the page fault is for read access.
However, current pte_mkwrite_novma() implementation always dirties the
PTE. This may cause unnecessary disk writing if the pages are
never written before being reclaimed.
So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the
PTE_DIRTY bit is set to make it possible to make the PTE writable and
clean.
The current behavior was introduced in commit 73e86cb03cf2 ("arm64:
Move PTE_RDONLY bit handling out of set_pte_at()"). Before that,
pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only
clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits
are set.
To test the performance impact of the patch, on an arm64 server
machine, run 16 redis-server processes on socket 1 and 16
memtier_benchmark processes on socket 0 with mostly get
transactions (that is, redis-server will mostly read memory only).
The memory footprint of redis-server is larger than the available
memory, so swap out/in will be triggered. Test results show that the
patch can avoid most swapping out because the pages are mostly clean.
And the benchmark throughput improves ~23.9% in the test.
Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
||
|
|
02e5f74ef0 |
ARM:
- Fix the handling of ZCR_EL2 in NV VMs
- Pick the correct translation regime when doing a PTW on
the back of a SEA
- Prevent userspace from injecting an event into a vcpu that isn't
initialised yet
- Move timer save/restore to the sysreg handling code, fixing EL2 timer
access in the process
- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
- Fix trapping configuration when the host isn't GICv3
- Improve the detection of HCR_EL2.E2H being RES1
- Drop a spurious 'break' statement in the S1 PTW
- Don't try to access SPE when owned by EL3
Documentation updates:
- Document the failure modes of event injection
- Document that a GICv3 guest can be created on a GICv5 host
with FEAT_GCIE_LEGACY
Selftest improvements:
- Add a selftest for the effective value of HCR_EL2.AMO
- Address build warning in the timer selftest when building with clang
- Teach irqfd selftests about non-x86 architectures
- Add missing sysregs to the set_id_regs selftest
- Fix vcpu allocation in the vgic_lpi_stress selftest
- Correctly enable interrupts in the vgic_lpi_stress selftest
x86:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
guest_memfd:
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjzqVIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO+qQgArc7XXmoiHQfTmdqbFL+1ipzfqd/c
SHJghONWVNKaSm0EsH72iEokmUyI8HssllaBuaGEAT/1F6YmRFwSSFgUG+N02rah
pL5ShCG2fPVxHal9ZJ04M4DYWPPClmcE2myfQ6k9kwcMgCRK2BdSRRnKH3XfOKrY
jAFNZVBCeODcnSvjOyxK2QFEt7J97H1AoAxOORvdqFmRqVIEQNJA/3Hx51wPfkwD
UnCQiNaPinDMxuuwvcmlYsIrQhGaqO4de1Kx0A4ZkSQqFUcyhvB6Qa+DoApz/IBw
qsFLqoR/1XXJ90wxutSTFzfjHM/SU6fhj57Cl9dAHI3pgnssC1iUvEt9Iw==
=dvAj
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the handling of ZCR_EL2 in NV VMs
- Pick the correct translation regime when doing a PTW on the back of
a SEA
- Prevent userspace from injecting an event into a vcpu that isn't
initialised yet
- Move timer save/restore to the sysreg handling code, fixing EL2
timer access in the process
- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
- Fix trapping configuration when the host isn't GICv3
- Improve the detection of HCR_EL2.E2H being RES1
- Drop a spurious 'break' statement in the S1 PTW
- Don't try to access SPE when owned by EL3
Documentation updates:
- Document the failure modes of event injection
- Document that a GICv3 guest can be created on a GICv5 host with
FEAT_GCIE_LEGACY
Selftest improvements:
- Add a selftest for the effective value of HCR_EL2.AMO
- Address build warning in the timer selftest when building with
clang
- Teach irqfd selftests about non-x86 architectures
- Add missing sysregs to the set_id_regs selftest
- Fix vcpu allocation in the vgic_lpi_stress selftest
- Correctly enable interrupts in the vgic_lpi_stress selftest
x86:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return
-EAGAIN if userspace deletes/moves memslot during prefault")
- Don't try to get PMU capabilities from perf when running a CPU with
hybrid CPUs/PMUs, as perf will rightly WARN.
guest_memfd:
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
more generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to
explicitly set said flag to initialize memory as SHARED,
irrespective of MMAP.
The behavior merged in 6.18 is that enabling mmap() implicitly
initializes memory as SHARED, which would result in an ABI
collision for x86 CoCo VMs as their memory is currently always
initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
private memory, to enable testing such setups, i.e. to hopefully
flush out any other lurking ABI issues before 6.18 is officially
released.
- Add testcases to the guest_memfd selftest to cover guest_memfd
without MMAP, and host userspace accesses to mmap()'d private
memory"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
arm64: Revamp HCR_EL2.E2H RES1 detection
KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
KVM: arm64: Kill leftovers of ad-hoc timer userspace access
KVM: arm64: Fix WFxT handling of nested virt
KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
KVM: arm64: Make timer_set_offset() generally accessible
KVM: arm64: Replace timer context vcpu pointer with timer_id
KVM: arm64: Introduce timer_context_to_vcpu() helper
KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
Documentation: KVM: Update GICv3 docs for GICv5 hosts
KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
KVM: arm64: selftests: Allocate vcpus with correct size
...
|
||
|
|
e9ad390a48 |
arm64/sysreg: Fix GIC CDEOI instruction encoding
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111
otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE.
Currenly, its usage is encoded as a system register write, with a constant
0 value:
write_sysreg_s(0, GICV5_OP_GIC_CDEOI)
While compiling with GCC, the 0 constant value, through these asm
constraints and modifiers ('x' modifier and 'Z' constraint combo):
asm volatile(__msr_s(r, "%x0") : : "rZ" (__val));
forces the compiler to issue the XZR register for the MSR operation (ie
that corresponds to Rt == 0b11111) issuing the right instruction encoding.
Unfortunately LLVM does not yet understand that modifier/constraint
combo so it ends up issuing a different register from XZR for the MSR
source, which in turns means that it encodes the GIC CDEOI instruction
wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE
that we must prevent.
Add a conditional to write_sysreg_s() macro that detects whether it
is passed a constant 0 value and issues an MSR write with XZR as source
register - explicitly doing what the asm modifier/constraint is meant to
achieve through constraints/modifiers, fixing the LLVM compilation issue.
Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support")
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Sascha Bischoff <sascha.bischoff@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
||
|
|
ca88ecdce5 |
arm64: Revamp HCR_EL2.E2H RES1 detection
We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com |
||
|
|
fb10ddf35c |
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
971199ad2a |
arm64 fixes for -rc1
- Preserve old 'tt_core' UAPI for Hisilicon L3C PMU driver. - Ensure linear alias of kprobes instruction page is not writable. - Fix kernel stack unwinding from BPF. - Fix build warnings from the Fujitsu uncore PMU documentation. - Fix hang with deferred 'struct page' initialisation and MTE. - Consolidate KPTI page-table re-writing code. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmjj0Q0QHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNJRvCACkC6isoQt+EdcS658ulgQJux9dqqJZq5uk IdSFmorYd+MZfRcXe0HQ/rSJcAHzZH2gS1MvSeD1JfILPnY1ZkCFWHxsDdKK0Yc7 Sn+3xYw07qpBiq/hCvf9udTCl61vJveexUazV+FgOd9cT+cQOtkqW98uggjOx6Ry HeECTovZ6NR23pB9NSu5l6lMbAOKZvbDuBYDOujYn/X5jP4v6qlGJTdB2YOA/qPy C3aOcqCOWuET6uQ5iGeXWEB+RF6QCNuqjgQYo745Q+yKFdfZ60wokP8Fv2GERzlu LO8YkYCzZRJF+lBbL8KfTjRE/fC6f0rHx9Zensm7FH9WsaDO/mWw =xrAV -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Preserve old 'tt_core' UAPI for Hisilicon L3C PMU driver - Ensure linear alias of kprobes instruction page is not writable - Fix kernel stack unwinding from BPF - Fix build warnings from the Fujitsu uncore PMU documentation - Fix hang with deferred 'struct page' initialisation and MTE - Consolidate KPTI page-table re-writing code * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not flag the zero page as PG_mte_tagged docs: perf: Fujitsu: Fix htmldocs build warnings and errors arm64: mm: Move KPTI helpers to mmu.c tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64 arm64: kprobes: call set_memory_rox() for kprobe page drivers/perf: hisi: Add tt_core_deprecated for compatibility |
||
|
|
f3826aa996 |
guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the kernel direct
map, as well as for limited mmap() for guest_memfd-backed memory.
For more information see:
* a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
* https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
(guest_memfd in Firecracker)
* https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
(direct map removal)
* https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
(mmap support)
ARM:
* Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
* Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
* Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
* Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This will
hopefully help debugging non-VHE setups.
* Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
* Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
* Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
* Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then allows
a bunch of features to be turned off, which helps cross-host
migration.
* Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
* Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
* Detect page table walk feature on new hardware
* Add sign extension with kernel MMIO/IOCSR emulation
* Improve in-kernel IPI emulation
* Improve in-kernel PCH-PIC emulation
* Move kvm_iocsr tracepoint out of generic code
RISC-V:
* Added SBI FWFT extension for Guest/VM with misaligned delegation and
pointer masking PMLEN features
* Added ONE_REG interface for SBI FWFT extension
* Added Zicbop and bfloat16 extensions for Guest/VM
* Enabled more common KVM selftests for RISC-V
* Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
* Improve interrupt cpu for wakeup, in particular the heuristic to decide
which vCPU to deliver a floating interrupt to.
* Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
* Add #DE coverage in the fastops test (the only exception that's guest-
triggerable in fastop-emulated instructions).
* Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
Forest (SRF) and Clearwater Forest (CWF).
* Minor cleanups and improvements
x86 (guest side):
* For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
could map devices as WB and prevent the device drivers from mapping their
devices with UC/UC-.
* Make kvm_async_pf_task_wake() a local static helper and remove its
export.
* Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
bindings even when PV_UNHALT is unsupported.
Generic:
* Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
now included in GFP_NOWAIT.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
=fyov
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This excludes the bulk of the x86 changes, which I will send
separately. They have two not complex but relatively unusual conflicts
so I will wait for other dust to settle.
guest_memfd:
- Add support for host userspace mapping of guest_memfd-backed memory
for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
(which isn't precisely the same thing as CoCo VMs, since x86's
SEV-MEM and SEV-ES have no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the
kernel direct map, as well as for limited mmap() for
guest_memfd-backed memory.
For more information see:
- commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD")
- guest_memfd in Firecracker:
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
- direct map removal:
https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
- mmap support:
https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
ARM:
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This
will hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then
allows a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
- Detect page table walk feature on new hardware
- Add sign extension with kernel MMIO/IOCSR emulation
- Improve in-kernel IPI emulation
- Improve in-kernel PCH-PIC emulation
- Move kvm_iocsr tracepoint out of generic code
RISC-V:
- Added SBI FWFT extension for Guest/VM with misaligned delegation
and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V
- Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
- Improve interrupt cpu for wakeup, in particular the heuristic to
decide which vCPU to deliver a floating interrupt to.
- Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
- Add #DE coverage in the fastops test (the only exception that's
guest- triggerable in fastop-emulated instructions).
- Fix PMU selftests errors encountered on Granite Rapids (GNR),
Sierra Forest (SRF) and Clearwater Forest (CWF).
- Minor cleanups and improvements
x86 (guest side):
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI
auto-mapping could map devices as WB and prevent the device drivers
from mapping their devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated
vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
Generic:
- Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
__GFP_NOWARN is now included in GFP_NOWAIT.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
KVM: s390: Fix to clear PTE when discarding a swapped page
KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
KVM: arm64: selftests: Add basic test for running in VHE EL2
KVM: arm64: selftests: Enable EL2 by default
KVM: arm64: selftests: Initialize HCR_EL2
KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
KVM: arm64: selftests: Select SMCCC conduit based on current EL
KVM: arm64: selftests: Provide helper for getting default vCPU target
KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
KVM: arm64: selftests: Add helper to check for VGICv3 support
KVM: arm64: selftests: Initialize VGICv3 only once
KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
KVM: selftests: Add ex_str() to print human friendly name of exception vectors
selftests/kvm: remove stale TODO in xapic_state_test
KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
...
|
||
|
|
8804d970fa |
Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from
Kairui Song improves performance and reduces the failure rate of swap
cluster allocation.
- The 4 patch series "support large align and nid in Rust allocators"
from Vitaly Wool permits Rust allocators to set NUMA node and large
alignment when perforning slub and vmalloc reallocs.
- The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from
Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets
for virtual address spaces for ops-level DAMOS filters.
- The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock"
from Suren Baghdasaryan reduces mmap_lock contention during reads of
/proc/pid/maps.
- The 2 patch series "mm/mincore: minor clean up for swap cache
checking" from Kairui Song performs some cleanup in the swap code.
- The 11 patch series "mm: vm_normal_page*() improvements" from David
Hildenbrand provides code cleanup in the pagemap code.
- The 5 patch series "add persistent huge zero folio support" from
Pankaj Raghav provides a block layer speedup by optionalls making the
huge_zero_pagepersistent, instead of releasing it when its refcount
falls to zero.
- The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a
few touchups to the recently added Kexec Handover feature.
- The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all
arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To
end the constant struggle with space shortage on 32-bit conflicting with
64-bit's needs.
- The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li
cleans up some swap code.
- The 7 patch series "selftests/mm: Fix false positives and skip
unsupported tests" from Donet Tom fixes a few things in our selftests
code.
- The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide
THPs when advised" from David Hildenbrand "allows individual processes
to opt-out of THP=always into THP=madvise, without affecting other
workloads on the system".
It's a long story - the [1/N] changelog spells out the considerations.
- The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox
gets us started on the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and
https://blogs.oracle.com/linux/post/introducing-memdesc.
- The 3 patch series "Tiny optimization for large read operations" from
Chi Zhiling improves the efficiency of the pagecache read path.
- The 5 patch series "Better split_huge_page_test result check" from Zi
Yan improves our folio splitting selftest code.
- The 2 patch series "test that rmap behaves as expected" from Wei Yang
adds some rmap selftests.
- The 3 patch series "remove write_cache_pages()" from Christoph Hellwig
removes that function and converts its two remaining callers.
- The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain
fixes some UFFD selftests issues.
- The 3 patch series "introduce kernel file mapped folios" from Boris
Burkov introduces the concept of "kernel file pages". Using these
permits btrfs to account its metadata pages to the root cgroup, rather
than to the cgroups of random inappropriate tasks.
- The 2 patch series "mm/pageblock: improve readability of some
pageblock handling" from Wei Yang provides some readability improvements
to the page allocator code.
- The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae
Park teaches DAMON to understand arm32 highmem.
- The 4 patch series "tools: testing: Use existing atomic.h for
vma/maple tests" from Brendan Jackman performs some code cleanups and
deduplication under tools/testing/.
- The 2 patch series "maple_tree: Fix testing for 32bit compiles" from
Liam Howlett fixes a couple of 32-bit issues in
tools/testing/radix-tree.c.
- The 2 patch series "kasan: unify kasan_enabled() and remove
arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN
arch-specific initialization code into a common arch-neutral
implementation.
- The 3 patch series "mm: remove zpool" from Johannes Weiner removes
zspool - an indirection layer which now only redirects to a single thing
(zsmalloc).
- The 2 patch series "mm: task_stack: Stack handling cleanups" from
Pasha Tatashin makes a couple of cleanups in the fork code.
- The 37 patch series "mm: remove nth_page()" from David Hildenbrand
makes rather a lot of adjustments at various nth_page() callsites,
eventually permitting the removal of that undesirable helper function.
- The 2 patch series "introduce kasan.write_only option in hw-tags" from
Yeoreum Yun creates a KASAN read-only mode for ARM, using that
architecture's memory tagging feature. It is felt that a read-only mode
KASAN is suitable for use in production systems rather than debug-only.
- The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation"
from Kefeng Wang does some tidying in the hugetlb folio allocation code.
- The 12 patch series "mm: establish const-correctness for pointer
parameters" from Max Kellermann makes quite a number of the MM API
functions more accurate about the constness of their arguments. This
was getting in the way of subsystems (in this case CEPH) when they
attempt to improving their own const/non-const accuracy.
- The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola
fixes a number of code sites which were confused over when to use
free_pages() vs __free_pages().
- The 3 patch series "Add Rust abstraction for Maple Trees" from Alice
Ryhl makes the mapletree code accessible to Rust. Required by nouveau
and by its forthcoming successor: the new Rust Nova driver.
- The 2 patch series "selftests/mm: split_huge_page_test:
split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and
some cleanups to the thp selftesting code.
- The 14 patch series "mm, swap: introduce swap table as swap cache
(phase I)" from Chris Li and Kairui Song is the first step along the
path to implementing "swap tables" - a new approach to swap allocation
and state tracking which is expected to yield speed and space
improvements. This patchset itself yields a 5-20% performance benefit
in some situations.
- The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes
the new memdesc layer to clean up the ptdesc code a little.
- The 3 patch series "Fix va_high_addr_switch.sh test failure" from
Chunyu Hu fixes some issues in our 5-level pagetable selftesting code.
- The 2 patch series "Minor fixes for memory allocation profiling" from
Suren Baghdasaryan addresses a couple of minor issues in relatively new
memory allocation profiling feature.
- The 3 patch series "Small cleanups" from Matthew Wilcox has a few
cleanups in preparation for more memdesc work.
- The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and
DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in
furtherance of supporting arm highmem.
- The 2 patch series "selftests/mm: Add -Wunreachable-code and fix
warnings" from Muhammad Anjum adds that compiler check to selftests code
and fixes the fallout, by removing dead code.
- The 10 patch series "Improvements to Victim Process Thawing and OOM
Reaper Traversal Order" from zhongjinji makes a number of improvements
in the OOM killer: mainly thawing a more appropriate group of victim
threads so they can release resources.
- The 5 patch series "mm/damon: misc fixups and improvements for 6.18"
from SeongJae Park is a bunch of small and unrelated fixups for DAMON.
- The 7 patch series "mm/damon: define and use DAMON initialization
check function" from SeongJae Park implement reliability and
maintainability improvements to a recently-added bug fix.
- The 2 patch series "mm/damon/stat: expose auto-tuned intervals and
non-idle ages" from SeongJae Park provides additional transparency to
userspace clients of the DAMON_STAT information.
- The 2 patch series "Expand scope of khugepaged anonymous collapse"
from Dev Jain removes some constraints on khubepaged's collapsing of
anon VMAs. It also increases the success rate of MADV_COLLAPSE against
an anon vma.
- The 2 patch series "mm: do not assume file == vma->vm_file in
compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards
removal of file_operations.mmap(). This patchset concentrates upon
clearing up the treatment of stacked filesystems.
- The 6 patch series "mm: Improve mlock tracking for large folios" from
Kiryl Shutsemau provides some fixes and improvements to mlock's tracking
of large folios. /proc/meminfo's "Mlocked" field became more accurate.
- The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters
during fork" from Donet Tom fixes several user-visible KSM stats
inaccuracies across forks and adds selftest code to verify these
counters.
- The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei
Yang addresses some potential but presently benign issues in KSM's
mm_slot handling.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA
jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1
2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA=
=4mhY
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves
performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool
permits Rust allocators to set NUMA node and large alignment when
perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
DAMOS_STAT's handling of the DAMON operations sets for virtual
address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
Baghdasaryan reduces mmap_lock contention during reads of
/proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song
performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides
code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides
a block layer speedup by optionalls making the
huge_zero_pagepersistent, instead of releasing it when its refcount
falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
Stoakes turns mm_struct.flags into a bitmap. To end the constant
struggle with space shortage on 32-bit conflicting with 64-bit's
needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
code
- "selftests/mm: Fix false positives and skip unsupported tests" from
Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
from David Hildenbrand "allows individual processes to opt-out of
THP=always into THP=madvise, without affecting other workloads on the
system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and
https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling
improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our
folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap
selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that
function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces
the concept of "kernel file pages". Using these permits btrfs to
account its metadata pages to the root cgroup, rather than to the
cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from
Wei Yang provides some readability improvements to the page allocator
code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from
Brendan Jackman performs some code cleanups and deduplication under
tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific
implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an
indirection layer which now only redirects to a single thing
(zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
adjustments at various nth_page() callsites, eventually permitting
the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
creates a KASAN read-only mode for ARM, using that architecture's
memory tagging feature. It is felt that a read-only mode KASAN is
suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max
Kellermann makes quite a number of the MM API functions more accurate
about the constness of their arguments. This was getting in the way
of subsystems (in this case CEPH) when they attempt to improving
their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
code sites which were confused over when to use free_pages() vs
__free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
mapletree code accessible to Rust. Required by nouveau and by its
forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements" from David Hildenbrand adds a fix and some cleanups to
the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris
Li and Kairui Song is the first step along the path to implementing
"swap tables" - a new approach to swap allocation and state tracking
which is expected to yield speed and space improvements. This
patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
addresses a couple of minor issues in relatively new memory
allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in
preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
Quanmin Yan makes some changes to DAMON in furtherance of supporting
arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
Anjum adds that compiler check to selftests code and fixes the
fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal
Order" from zhongjinji makes a number of improvements in the OOM
killer: mainly thawing a more appropriate group of victim threads so
they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from
SeongJae Park implement reliability and maintainability improvements
to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
SeongJae Park provides additional transparency to userspace clients
of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
some constraints on khubepaged's collapsing of anon VMAs. It also
increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
from Lorenzo Stoakes moves us further towards removal of
file_operations.mmap(). This patchset concentrates upon clearing up
the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
provides some fixes and improvements to mlock's tracking of large
folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
Donet Tom fixes several user-visible KSM stats inaccuracies across
forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
mm: swap: check for stable address space before operating on the VMA
mm: convert folio_page() back to a macro
mm/khugepaged: use start_addr/addr for improved readability
hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
alloc_tag: fix boot failure due to NULL pointer dereference
mm: silence data-race in update_hiwater_rss
mm/memory-failure: don't select MEMORY_ISOLATION
mm/khugepaged: remove definition of struct khugepaged_mm_slot
mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
hugetlb: increase number of reserving hugepages via cmdline
selftests/mm: add fork inheritance test for ksm_merging_pages counter
mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
drivers/base/node: fix double free in register_one_node()
mm: remove PMD alignment constraint in execmem_vmalloc()
mm/memory_hotplug: fix typo 'esecially' -> 'especially'
mm/rmap: improve mlock tracking for large folios
mm/filemap: map entire large folio faultaround
mm/fault: try to map the entire file folio in finish_fault()
mm/rmap: mlock large folios in try_to_unmap_one()
mm/rmap: fix a mlock race condition in folio_referenced_one()
...
|
||
|
|
4b81e2eb9e |
Updates for the VDSO subsystem:
- Further consolidation of the VDSO infrastructure and the common data
store.
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaUJgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT5VD/9l+TP42vJzwygPArefVM9QijAc4k+g
iptaz5M+Lhk0BlwMUu502Tp1zcwGq5529LlE2P72DuYoZRtZKSFCGES2mye0JiAs
A5Vk8d1zicg3IIGjYgw39shA9ghzuzV3gZdgHVqQ6024uSeKxUY3SqF156GGfPFG
azXWDV7i4RGQvKLm2ye8m7JNYCE3P9f8Wh+GvPaq7qBlW1MsMCWC4FP5qu4iStf3
mol9uA/EM2zSDwkpIpq6P5L2TpIH1dGDNGxVf/HG3J16UhNXzzXv/rxuKHKPNGWm
MQZ8WdPjF+uwdUsNXaImC6aCE0hoQ1Ga7GE2YA3pMQYfySeiJ/fA5I1KfhlUCiHD
oUgnb6PQfZjaBDrIODKcrFHAYc5pTOyJXnJ6q1Lc1yrOfoVcIpMaRXHO/phCeuwI
61gfb9ggbFEH/7cL26Z9XGSqh4BDCKVP5Z+y98OnNEzV74ScpX13/szB3VgdN6LW
tmTOeMPYCo4FQoPw8OV1xkeYOKD2JyqH+garnxDZNFjaVaeUkzlAMKz+1JUjxRZj
UpUhZ8y5KFVjgKkqZHxhucr9cnwV/xd4n+IzZeG4l6XZp4VyirG+5QpAikGYe+Jq
WFU4Ao1xvevyizrJEhoGJKznwtrNrsy6AQ+wLQxYiRR4aTnng8n9T/ry2DVvhJf8
diYWxuRTR7KMxg==
=i3UB
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:
- Further consolidation of the VDSO infrastructure and the common data
store
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: Drop vdso_test_clock_getres
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
vdso: Drop kconfig GENERIC_COMPAT_VDSO
vdso: Drop kconfig GENERIC_VDSO_32
riscv: vdso: Untangle Kconfig logic
time: Build generic update_vsyscall() only with generic time vDSO
vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
vdso: Move ENABLE_COMPAT_VDSO from core to arm64
ARM: VDSO: Remove cntvct_ok global variable
vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
|
||
|
|
924ccf1d09 |
KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmjU1XEACgkQrUjsVaLH LAfahQ//aPSwG0jbPwRSmWtoqdm38VvKYuWOOxeCjKyas5w0cZPkMQCh3zV/Ks92 YKNIe6wlDeLIlT+qIzAi5VLwHHiC3ggoJsuNhf7/qmJJvks+1quvl1clUDkVGbRu G+FKFkRpBa1HF2O3NASnvBnRLYzXa/wKOuArwsY5DEG4+irnJU3AegbqRuhyPBFr VX+LwpfUUHipFGg/0ivflsVBjcz42Y3i1VQ4oXzuHqsRn3Ig3eSy9dGTHIOqJ0A3 leNbDUSdZxnMlj3Sfab7hH4Vxlr8kiCEgMogHyYpnlyPvG8zJobMWDE9Ziy86D7f 130zMcHuF0ZkeuaKX+3o2L7yGNTnN/JNtns7VtClRShGqSA8Dtn2xLhnvyray7RH CIYjv//34z1BjWyCBfuN5kFIfX5K7cNQHlDP7RVfgw91k7rbCGCJF743nkxz1WBX 98G05/Rnmn+KCtU8t2pRoG9Qkq+bE3iz8Ka8thmiUNciqO78nn/p7a6OEMcjn7jH C+VUNST519UBGCjLehBrCZuUOtrPRozHz7Adx3VTJT5wBX2hd/QLoNoMssEi9VmS 1j/abh7HJcbe7aTUDWUhOK9I+bPHJRsbeNyL5r9jpoC2Xg5Njh/V3aQZ/uuLzO5+ pYjciRrrmdZ30xkInulWI6wORHkmki34RRmsM9gqYIJBMV+Rcxg= =BCyU -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver |
||
|
|
924ebaefce |
KVM/arm64 updates for 6.18
- Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmjVhSsACgkQI9DQutE9 ekOSIg//YdbA/zo17GrLnJnlGdUnS0e3/357n3e5Lypx1UFRDTmNacpVPw4VG/jt eVpQn7AgYwyvKfCq46eD+hBBqNv1XTn4DeXttv7CmVqhCRythsEvDkTBWSt7oYUZ xfYXCMKNhqUElH4AbYYx3y7nb2E9/KVGr+NBn6Vf5c14OZ3MGVc/fyp4jM1ih5dR kcV2onAYohlIGvFEyZMBtJ+jkYkqfIbfxfqCL0RAET0aEBFcmM1aXybWZj47hlLM f2j+E6cFQ0ZzUt+3pFhT75wo43lHGtIFDjVd60uishyU+NXTVvqRmXDTRU4k546W 18HHX1yijbzuXIatVhVRo2hIq3jKU37T9wtj46BejbDHRdAPENEyN/Qopm7rNS+X mCwOT7He6KR+H4rU6nFaTcsS7bNRCvIbZP9i9zb6NElbvXu5QnM8BUQsYFCDUa/n xtbtQlckbo/7zeoUsBDrGj2XmCf0d45FTHb7fdWOYEmMSmJhXYpUKdM4JcLyhKoQ DD0ox2S+pt2lwNw3XOSABdES0KJxCvDASAMIgn2h2sGpY8FxsBcVW/BufXopdafG UeInxWaILp2iCDM4tH2GLjKqlvMAOwcA+mAEZToXypxlJAnYA6J1pXCF8WEaM6+D BGrLli8Zd8JRs87byq6K7tp8oLNzZdliJp73j5jfOHTJA4MnvFI= =iAL3 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. |
||
|
|
feafee2845 |
arm64 updates for 6.18
Confidential computing:
- Add support for accepting secrets from firmware (e.g. ACPI CCEL)
and mapping them with appropriate attributes.
CPU features:
- Advertise atomic floating-point instructions to userspace.
- Extend Spectre workarounds to cover additional Arm CPU variants.
- Extend list of CPUs that support break-before-make level 2 and
guarantee not to generate TLB conflict aborts for changes of mapping
granularity (BBML2_NOABORT).
- Add GCS support to our uprobes implementation.
Documentation:
- Remove bogus SME documentation concerning register state when
entering/exiting streaming mode.
Entry code:
- Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY).
- Micro-optimise syscall entry path with a compiler branch hint.
Memory management:
- Enable huge mappings in vmalloc space even when kernel page-table
dumping is enabled.
- Tidy up the types used in our early MMU setup code.
- Rework rodata= for closer parity with the behaviour on x86.
- For CPUs implementing BBML2_NOABORT, utilise block mappings in the
linear map even when rodata= applies to virtual aliases.
- Don't re-allocate the virtual region between '_text' and '_stext',
as doing so confused tools parsing /proc/vmcore.
Miscellaneous:
- Clean-up Kconfig menuconfig text for architecture features.
- Avoid redundant bitmap_empty() during determination of supported
SME vector lengths.
- Re-enable warnings when building the 32-bit vDSO object.
- Avoid breaking our eggs at the wrong end.
Perf and PMUs:
- Support for v3 of the Hisilicon L3C PMU.
- Support for Hisilicon's MN and NoC PMUs.
- Support for Fujitsu's Uncore PMU.
- Support for SPE's extended event filtering feature.
- Preparatory work to enable data source filtering in SPE.
- Support for multiple lanes in the DWC PCIe PMU.
- Support for i.MX94 in the IMX DDR PMU driver.
- MAINTAINERS update (Thank you, Yicong).
- Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
Selftests:
- Add basic LSFE check to the existing hwcaps test.
- Support nolibc in GCS tests.
- Extend SVE ptrace test to pass unsupported regsets and invalid vector
lengths.
- Minor cleanups (typos, cosmetic changes).
System registers:
- Fix ID_PFR1_EL1 definition.
- Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1.
- Sync TCR_EL1 definition with the latest Arm ARM (L.b).
- Be stricter about the input fed into our AWK sysreg generator script.
- Typo fixes and removal of redundant definitions.
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree.
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls.
- Fix a node refcount imbalance in the PSCI device-tree code.
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4.
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests
can reliably query the underlying CPU types from the VMM.
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmjWlMIQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNIVYB/sHaFYmToEaK4ldMfTn4ZQ8yr9ZuTMLr/ji
zOm7wULP08wKIcp6t+1N6e7A8Wb3YSErxTywc4b9MqctIel1WvBMewjJd38xb2qO
hFCuRuWemfyt6Rw/EGMCP54yueLzKbnN4q+/Aks8jedWtS+AXY6McGCJOzMjuECL
zKb68Hqqb09YQ2b2BPSAiU9g42rotiIYppLaLbEdxUnw/eqfEN3wSrl92/LuBlqH
r2wZDnJwA5q8Iy9HWlnhg4Jax0Cs86jSJPIQLB6v3WWhc3HR49pm5cqYzYkUht9L
nA6eaWJiBl/+k3S+ftPKNzU8n04r15lqyNZE2FYfRk+0BPSacJOf
=kO15
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's good stuff across the board, including some nice mm
improvements for CPUs with the 'noabort' BBML2 feature and a clever
patch to allow ptdump to play nicely with block mappings in the
vmalloc area.
Confidential computing:
- Add support for accepting secrets from firmware (e.g. ACPI CCEL)
and mapping them with appropriate attributes.
CPU features:
- Advertise atomic floating-point instructions to userspace
- Extend Spectre workarounds to cover additional Arm CPU variants
- Extend list of CPUs that support break-before-make level 2 and
guarantee not to generate TLB conflict aborts for changes of
mapping granularity (BBML2_NOABORT)
- Add GCS support to our uprobes implementation.
Documentation:
- Remove bogus SME documentation concerning register state when
entering/exiting streaming mode.
Entry code:
- Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)
- Micro-optimise syscall entry path with a compiler branch hint.
Memory management:
- Enable huge mappings in vmalloc space even when kernel page-table
dumping is enabled
- Tidy up the types used in our early MMU setup code
- Rework rodata= for closer parity with the behaviour on x86
- For CPUs implementing BBML2_NOABORT, utilise block mappings in the
linear map even when rodata= applies to virtual aliases
- Don't re-allocate the virtual region between '_text' and '_stext',
as doing so confused tools parsing /proc/vmcore.
Miscellaneous:
- Clean-up Kconfig menuconfig text for architecture features
- Avoid redundant bitmap_empty() during determination of supported
SME vector lengths
- Re-enable warnings when building the 32-bit vDSO object
- Avoid breaking our eggs at the wrong end.
Perf and PMUs:
- Support for v3 of the Hisilicon L3C PMU
- Support for Hisilicon's MN and NoC PMUs
- Support for Fujitsu's Uncore PMU
- Support for SPE's extended event filtering feature
- Preparatory work to enable data source filtering in SPE
- Support for multiple lanes in the DWC PCIe PMU
- Support for i.MX94 in the IMX DDR PMU driver
- MAINTAINERS update (Thank you, Yicong)
- Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
Selftests:
- Add basic LSFE check to the existing hwcaps test
- Support nolibc in GCS tests
- Extend SVE ptrace test to pass unsupported regsets and invalid
vector lengths
- Minor cleanups (typos, cosmetic changes).
System registers:
- Fix ID_PFR1_EL1 definition
- Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1
- Sync TCR_EL1 definition with the latest Arm ARM (L.b)
- Be stricter about the input fed into our AWK sysreg generator
script
- Typo fixes and removal of redundant definitions.
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls
- Fix a node refcount imbalance in the PSCI device-tree code.
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
guests can reliably query the underlying CPU types from the VMM
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: cpufeature: Remove duplicate asm/mmu.h header
arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
perf/dwc_pcie: Fix use of uninitialized variable
arm/syscalls: mark syscall invocation as likely in invoke_syscall
Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
Documentation: hisi-pmu: Fix of minor format error
drivers/perf: hisi: Add support for L3C PMU v3
drivers/perf: hisi: Refactor the event configuration of L3C PMU
drivers/perf: hisi: Extend the field of tt_core
drivers/perf: hisi: Extract the event filter check of L3C PMU
drivers/perf: hisi: Simplify the probe process of each L3C PMU version
drivers/perf: hisi: Export hisi_uncore_pmu_isr()
drivers/perf: hisi: Relax the event ID check in the framework
perf: Fujitsu: Add the Uncore PMU driver
arm64: map [_text, _stext) virtual address range non-executable+read-only
arm64/sysreg: Update TCR_EL1 register
arm64: Enable vmalloc-huge with ptdump
arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
arm64: errata: Apply workarounds for Neoverse-V3AE
arm64: cputype: Add Neoverse-V3AE definitions
...
|
||
|
|
200b0d2508 |
arm64: mm: Move KPTI helpers to mmu.c
create_kpti_ng_temp_pgd() is currently defined (as an alias) in
mmu.c without matching declaration in a header; instead cpufeature.c
makes its own declaration. This is clearly not pretty, and as commit
ceca927c86e6 ("arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc
function signature") showed, it also makes it very easy for the
prototypes to go out of sync.
All this would be much simpler if kpti_install_ng_mappings() and
associated functions lived in mmu.c, where they logically belong.
This is what this patch does:
- Move kpti_install_ng_mappings() and associated functions from
cpufeature.c to mmu.c, add a declaration to <asm/mmu.h>
- Remove create_kpti_ng_temp_pgd() and just call
__create_pgd_mapping_locked() directly instead
- Mark all these functions __init
- Move __initdata after kpti_ng_temp_alloc (as suggested by
checkpatch)
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
[will: Fix conflicts with init_idmap_kpti_bbml2_flag()]
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
fd2f74f8f3 |
tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64
When using bpf_program__attach_kprobe_multi_opts on ARM64 to hook a BPF program
that contains the bpf_get_stackid function, the BPF program fails
to obtain the stack trace and returns -EFAULT.
This is because ftrace_partial_regs omits the configuration of the pstate register,
leaving pstate at the default value of 0. When get_perf_callchain executes,
it uses user_mode(regs) to determine whether it is in kernel mode.
This leads to a misjudgment that the code is in user mode,
so perf_callchain_kernel is not executed and the function returns directly.
As a result, trace->nr becomes 0, and finally -EFAULT is returned.
Therefore, the assignment of the pstate register is added here.
Fixes: b9b55c8912ce ("tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs")
Closes: https://lore.kernel.org/bpf/20250919071902.554223-1-yangfeng59949@163.com/
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
4e4e36dce3 |
Merge branch 'for-next/uprobes' into for-next/core
* for-next/uprobes: arm64: probes: Fix incorrect bl/blr address and register usage uprobes: uprobe_warn should use passed task arm64: Kconfig: Remove GCS restrictions on UPROBES arm64: uprobes: Add GCS support to uretprobes arm64: probes: Add GCS support to bl/blr/ret arm64: uaccess: Add additional userspace GCS accessors arm64: uaccess: Move existing GCS accessors definitions to gcs.h arm64: probes: Break ret out from bl/blr |
||
|
|
c7c7eb4f0e |
Merge branch 'for-next/sysregs' into for-next/core
* for-next/sysregs: arm64/sysreg: Update TCR_EL1 register arm64: sysreg: Add validation checks to sysreg header generation script arm64: sysreg: Correct sign definitions for EIESB and DoubleLock arm64: sysreg: Fix and tidy up sysreg field definitions |
||
|
|
f2d64a22fa |
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (29 commits) perf/dwc_pcie: Fix use of uninitialized variable Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver perf/arm-cmn: Fix CMN S3 DTM offset perf: arm_spe: Prevent overflow in PERF_IDX2OFF() coresight: trbe: Prevent overflow in PERF_IDX2OFF() MAINTAINERS: Remove myself from HiSilicon PMU maintainers drivers/perf: hisi: Add support for HiSilicon MN PMU driver drivers/perf: hisi: Add support for HiSilicon NoC PMU perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS arm64/boot: Factor out a macro to check SPE version ... |
||
|
|
77dfca70ba |
Merge branch 'for-next/mm' into for-next/core
* for-next/mm: arm64: map [_text, _stext) virtual address range non-executable+read-only arm64: Enable vmalloc-huge with ptdump arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs arm64: mm: support large block mapping when rodata=full arm64: Enable permission change on arm64 kernel block mappings arm64/Kconfig: Remove CONFIG_RODATA_FULL_DEFAULT_ENABLED arm64: mm: Rework the 'rodata=' options arm64: mm: Represent physical memory with phys_addr_t and resource_size_t arm64: mm: Make map_fdt() return mapped pointer arm64: mm: Cast start/end markers to char *, not u64 |
||
|
|
7df73a0049 |
Merge branch 'for-next/entry' into for-next/core
* for-next/entry: arm/syscalls: mark syscall invocation as likely in invoke_syscall arm64: entry: Switch to generic IRQ entry arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() arm64: entry: Refactor preempt_schedule_irq() check code entry: Add arch_irqentry_exit_need_resched() for arm64 arm64: entry: Use preempt_count() and need_resched() helper arm64: entry: Rework arm64_preempt_schedule_irq() arm64: entry: Refactor the entry and exit for exceptions from EL1 arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() |
||
|
|
3d751c56c9 |
Merge branch 'for-next/cpufeature' into for-next/core
* for-next/cpufeature: arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions arm64: cpufeature: add AmpereOne to BBML2 allow list arm64: cpufeature: Add Olympus MIDR to BBML2 allow list arm64: cputype: Add NVIDIA Olympus definitions arm64: cputype: Remove duplicate Cortex-X1C definitions arm64: errata: Expand speculative SSBS workaround for Cortex-A720AE arm64: cputype: Add Cortex-A720AE definitions arm64/hwcap: Add hwcap for FEAT_LSFE |
||
|
|
5647d32f51 |
Merge branch 'for-next/cca' into for-next/core
* for-next/cca: arm64: acpi: Enable ACPI CCEL support arm64: Enable EFI secret area Securityfs support arm64: realm: ioremap: Allow mapping memory as encrypted |
||
|
|
14f158552e |
arm64/sysreg: Update TCR_EL1 register
Update TCR_EL1 register fields as per latest ARM ARM DDI 0487 L.B and while here drop an explicit sysreg definition SYS_TCR_EL1 from sysreg.h, which is now redundant. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
fa93b45fd3 |
arm64: Enable vmalloc-huge with ptdump
Our goal is to move towards enabling vmalloc-huge by default on arm64 so
as to reduce TLB pressure. Therefore, we need a way to analyze the portion
of block mappings in vmalloc space we can get on a production system; this
can be done through ptdump, but currently we disable vmalloc-huge if
CONFIG_PTDUMP_DEBUGFS is on. The reason is that lazy freeing of kernel
pagetables via vmap_try_huge_pxd() may race with ptdump, so ptdump
may dereference a bogus address.
To solve this, we need to synchronize ptdump_walk() and ptdump_check_wx()
with pud_free_pmd_page() and pmd_free_pte_page().
Since this race is very unlikely to happen in practice, we do not want to
penalize the vmalloc pagetable tearing path by taking the init_mm
mmap_lock. Therefore, we use static keys. ptdump_walk() and
ptdump_check_wx() are the pagetable walkers; they will enable the static
key - upon observing that, the vmalloc pagetable tearing path will get
patched in with an mmap_read_lock/unlock sequence. A combination of the
patched-in mmap_read_lock/unlock, the acquire semantics of
static_branch_inc(), and the barriers in __flush_tlb_kernel_pgtable()
ensures that ptdump will never get a hold on the address of a freed PMD
or PTE table.
We can verify the correctness of the algorithm via the following litmus
test (thanks to James Houghton and Will Deacon):
AArch64 ptdump
Variant=Ifetch
{
uint64_t pud=0xa110c;
uint64_t pmd;
0:X0=label:"P1:L0"; 0:X1=instr:"NOP"; 0:X2=lock; 0:X3=pud; 0:X4=pmd;
1:X1=0xdead; 1:X2=lock; 1:X3=pud; 1:X4=pmd;
}
P0 | P1 ;
(* static_key_enable *) | (* pud_free_pmd_page *) ;
STR W1, [X0] | LDR X9, [X3] ;
DC CVAU,X0 | STR XZR, [X3] ;
DSB ISH | DSB ISH ;
IC IVAU,X0 | ISB ;
DSB ISH | ;
ISB | (* static key *) ;
| L0: ;
(* mmap_lock *) | B out1 ;
Lwlock: | ;
MOV W7, #1 | (* mmap_lock *) ;
SWPA W7, W8, [X2] | Lrlock: ;
| MOV W7, #1 ;
| SWPA W7, W8, [X2] ;
(* walk pgtable *) | ;
LDR X9, [X3] | (* mmap_unlock *) ;
CBZ X9, out0 | STLR WZR, [X2] ;
EOR X10, X9, X9 | ;
LDR X11, [X4, X10] | out1: ;
| EOR X10, X9, X9 ;
out0: | STR X1, [X4, X10] ;
exists (0:X8=0 /\ 1:X8=0 /\ (* Lock acquisitions succeed *)
0:X9=0xa110c /\ (* P0 sees the valid PUD ...*)
0:X11=0xdead) (* ... but the freed PMD *)
For an approximate written proof of why this algorithm works, please read
the code comment in [1], which is now removed for the sake of simplicity.
mm-selftests pass. No issues were observed while parallelly running
test_vmalloc.sh (which stresses the vmalloc subsystem),
and cat /sys/kernel/debug/{kernel_page_tables, check_wx_pages} in a loop.
Link: https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/ [1]
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
3bbf004c48 |
arm64: cputype: Add Neoverse-V3AE definitions
Add cputype definitions for Neoverse-V3AE. These will be used for errata detection in subsequent patches. These values can be found in the Neoverse-V3AE TRM: https://developer.arm.com/documentation/SDEN-2615521/9-0/ ... in section A.6.1 ("MIDR_EL1, Main ID Register"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
31d8edb535 |
kasan/hw-tags: introduce kasan.write_only option
Patch series "introduce kasan.write_only option in hw-tags", v8. Hardware tag based KASAN is implemented using the Memory Tagging Extension (MTE) feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI (Top Byte Ignore) feature and allows software to access a 4-bit allocation tag for each 16-byte granule in the physical address space. A logical tag is derived from bits 59-56 of the virtual address used for the memory access. A CPU with MTE enabled will compare the logical tag against the allocation tag and potentially raise an tag check fault on mismatch, subject to system registers configuration. Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. Using this feature (FEAT_MTE_STORE_ONLY), introduce KASAN write-only mode which restricts KASAN check write (store) operation only. This mode omits KASAN check for read (fetch/load) operation. Therefore, it might be used not only debugging purpose but also in normal environment. This patch (of 2): Since Armv8.9, FEATURE_MTE_STORE_ONLY feature is introduced to restrict raise of tag check fault on store operation only. Introduce KASAN write only mode based on this feature. KASAN write only mode restricts KASAN checks operation for write only and omits the checks for fetch/read operations when accessing memory. So it might be used not only debugging enviroment but also normal enviroment to check memory safty. This features can be controlled with "kasan.write_only" arguments. When "kasan.write_only=on", KASAN checks write operation only otherwise KASAN checks all operations. This changes the MTE_STORE_ONLY feature as BOOT_CPU_FEATURE like ARM64_MTE_ASYMM so that makes it initialise in kasan_init_hw_tags() with other function together. Link: https://lkml.kernel.org/r/20250916222755.466009-1-yeoreum.yun@arm.com Link: https://lkml.kernel.org/r/20250916222755.466009-2-yeoreum.yun@arm.com Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Breno Leitao <leitao@debian.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io> Cc: James Morse <james.morse@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: levi.yun <yeoreum.yun@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
181ce6b01a |
Merge branch kvm-arm64/misc-6.18 into kvmarm-master/next
* kvm-arm64/misc-6.18: : . : . : Misc improvements and bug fixes: : : - Fix XN handling in the S2 page table dumper : (20250809135356.1003520-1-r09922117@csie.ntu.edu.tw) : : - Fix sanitity checks for huge mapping with pKVM running np guests : (20250815162655.121108-1-ben.horgan@arm.com) : : - Fix use of TRBE when KVM is disabled, and Linux running under : a lesser hypervisor (20250902-etm_crash-v2-1-aa9713a7306b@oss.qualcomm.com) : : - Fix out of date MTE-related comments (20250915155234.196288-1-alexandru.elisei@arm.com) : : - Fix PSCI BE support when running a NV guest (20250916161103.1040727-1-maz@kernel.org) : : - Fix page reference leak when refusing to map a page due to mismatched attributes : (20250917130737.2139403-1-tabba@google.com) : : - Add trap handling for PMSDSFR_EL1 : (20250901-james-perf-feat_spe_eft-v8-7-2e2738f24559@linaro.org) : : - Add advertisement from FEAT_LSFE (Large System Float Extension) : (20250918-arm64-lsfe-v4-1-0abc712101c7@kernel.org) : . KVM: arm64: Expose FEAT_LSFE to guests KVM: arm64: Add trap configs for PMSDSFR_EL1 KVM: arm64: Fix page leak in user_mem_abort() KVM: arm64: Fix kvm_vcpu_{set,is}_be() to deal with EL2 state KVM: arm64: Update stale comment for sanitise_mte_tags() KVM: arm64: Return early from trace helpers when KVM isn't available KVM: arm64: Fix debug checking for np-guests using huge mappings KVM: arm64: ptdump: Don't test PTE_VALID alongside other attributes Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
47f15744fc |
Merge branch kvm-arm64/nv-misc-6.18 into kvmarm-master/next
* kvm-arm64/nv-misc-6.18:
: .
: Various NV-related fixes:
:
: - Relax KVM's SError injection to consider that HCR_EL2.AMO's
: effective value is 1 when HCR_EL2.{E2H,TGE)=={1,0}.
: (20250918164632.410404-1-oliver.upton@linux.dev)
:
: - Allow userspace to disable some S2 base granule sizes
: (20250918165505.415017-1-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs
KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0}
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
||
|
|
f01c7baa16 |
Merge branch kvm-arm64/nv-debug into kvmarm-master/next
* kvm-arm64/nv-debug: : . : Fix handling of MDSCR_EL1 in NV context, which is unfortunately : mishandled by the architecture. Patches courtesy of Oliver Upton : (20250917203125.283116-2-oliver.upton@linux.dev) : . KVM: arm64: nv: Apply guest's MDCR traps in nested context KVM: arm64: nv: Trap debug registers when in hyp context Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
8cba6c8b87 |
Merge branch kvm-arm64/52bit-at into kvmarm-master/next
* kvm-arm64/52bit-at: : . : Upgrade the S1 page table walker to support 52bit PA, and use it to : report the fault level when taking a S2 fault on S1PTW, which is required : by the architecture (20250915114451.660351-1-maz@kernel.org). : . KVM: arm64: selftest: Expand external_aborts test to look for TTW levels KVM: arm64: Populate level on S1PTW SEA injection KVM: arm64: Add S1 IPA to page table level walker KVM: arm64: Add filtering hook to S1 page table walk KVM: arm64: Don't switch MMU on translation from non-NV context KVM: arm64: Allow EL1 control registers to be accessed from the CPU state KVM: arm64: Allow use of S1 PTW for non-NV vcpus KVM: arm64: Report faults from S1 walk setup at the expected start level KVM: arm64: Expand valid block mappings to FEAT_LPA/LPA2 support KVM: arm64: Populate PAR_EL1 with 52bit addresses KVM: arm64: Compute shareability for LPA2 KVM: arm64: Pass the walk_info structure to compute_par_s1() KVM: arm64: Decouple output address from the PT descriptor KVM: arm64: Compute 52bit TTBR address and alignment KVM: arm64: Account for 52bit when computing maximum OA KVM: arm64: Add helper computing the state of 52bit PA support Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
b8e625167a |
KVM: arm64: Add S1 IPA to page table level walker
Use the filtering hook infrastructure to implement a new walker that, for a given VA and an IPA, returns the level of the first occurence of this IPA in the walk from that VA. This will be used to improve our SEA syndrome reporting. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
0c5471408c |
KVM: arm64: Add filtering hook to S1 page table walk
Add a filtering hook that can get called on each level of the walk, and providing access to the full state. Crucially, this is called *before* the access is made, so that it is possible to track down the level of a faulting access. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
c0cc438046 |
KVM: arm64: Compute shareability for LPA2
LPA2 gets the memory access shareability from TCR_ELx instead of getting it form the descriptors. Store it in the walk info struct so that it is passed around and evaluated as required. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
23cf13def0 |
KVM: arm64: Account for 52bit when computing maximum OA
Adjust the computation of the max OA to account for 52bit PAs. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
0090c0a247 |
KVM: arm64: Add helper computing the state of 52bit PA support
Track whether the guest is using 52bit PAs, either LPA or LPA2. This further simplifies the handling of LVA for 4k and 16k pages, as LPA2 implies LVA in this case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
3df6979d22 |
arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs
The kernel linear mapping is painted in very early stage of system boot. The cpufeature has not been finalized yet at this point. So the linear mapping is determined by the capability of boot CPU only. If the boot CPU supports BBML2, large block mappings will be used for linear mapping. But the secondary CPUs may not support BBML2, so repaint the linear mapping if large block mapping is used and the secondary CPUs don't support BBML2 once cpufeature is finalized on all CPUs. If the boot CPU doesn't support BBML2 or the secondary CPUs have the same BBML2 capability with the boot CPU, repainting the linear mapping is not needed. Repainting is implemented by the boot CPU, which we know supports BBML2, so it is safe for the live mapping size to change for this CPU. The linear map region is walked using the pagewalk API and any discovered large leaf mappings are split to pte mappings using the existing helper functions. Since the repainting is performed inside of a stop_machine(), we must use GFP_ATOMIC to allocate the extra intermediate pgtables. But since we are still early in boot, it is expected that there is plenty of memory available so we will never need to sleep for reclaim, and so GFP_ATOMIC is acceptable here. The secondary CPUs are all put into a waiting area with the idmap in TTBR0 and reserved map in TTBR1 while this is performed since they cannot be allowed to observe any size changes on the live mappings. Some of this infrastructure is reused from the kpti case. Specifically we share the same flag (was __idmap_kpti_flag, now idmap_kpti_bbml2_flag) since it means we don't have to reserve any extra pgtable memory to idmap the extra flag. Co-developed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
557c82a448 |
KVM: arm64: Add trap configs for PMSDSFR_EL1
SPE data source filtering (SPE_FEAT_FDS) adds a new register PMSDSFR_EL1, add the trap configs for it. PMSNEVFR_EL1 was also missing its VNCR offset so add it along with PMSDSFR_EL1. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
ff37a41db8 |
KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0}
SErrors are not deliverable at EL2 when the effective value of
HCR_EL2.{TGE,AMO} = {0, 0}. This is bothersome to deal with in nested
as we need to use auxiliary pending state to track the pending vSError
since HCR_EL2.VSE has no mechanism for honoring the guest HCR. On top of
that, we have no way of making that auxiliary pending state visible in
ISR_EL1.
A defect against the architecture now allows an implementation to treat
HCR_EL2.AMO as 1 when HCR_EL2.{E2H,TGE} = {1, 0}. Let's do exactly that,
meaning SErrors are always deliverable at EL2 for the typical E2H=RES1
VM.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
||
|
|
9e8a3df3e7 |
arm64: Enable EFI secret area Securityfs support
Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required by the driver. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
fa84e534c3 |
arm64: realm: ioremap: Allow mapping memory as encrypted
For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose an encrypted vs decrypted mapping. However, we may have firmware reserved memory regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL). We need to make sure that anything that is RIPAS_RAM (i.e., Guest protected memory with RMM guarantees) are also mapped as encrypted. Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be protected by the RMM. Thus we choose encrypted mapping for anything that is not RIPAS_EMPTY. While at it, rename the helper function __arm64_is_protected_mmio => arm64_rsi_is_protected to clearly indicate that this not an arm64 generic helper, but something to do with Realms. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
a166563e7e |
arm64: mm: support large block mapping when rodata=full
When rodata=full is specified, kernel linear mapping has to be mapped at
PTE level since large page table can't be split due to break-before-make
rule on ARM64.
This resulted in a couple of problems:
- performance degradation
- more TLB pressure
- memory waste for kernel page table
With FEAT_BBM level 2 support, splitting large block page table to
smaller ones doesn't need to make the page table entry invalid anymore.
This allows kernel split large block mapping on the fly.
Add kernel page table split support and use large block mapping by
default when FEAT_BBM level 2 is supported for rodata=full. When
changing permissions for kernel linear mapping, the page table will be
split to smaller size.
The machine without FEAT_BBM level 2 will fallback to have kernel linear
mapping PTE-mapped when rodata=full.
With this we saw significant performance boost with some benchmarks and
much less memory consumption on my AmpereOne machine (192 cores, 1P)
with 256GB memory.
* Memory use after boot
Before:
MemTotal: 258988984 kB
MemFree: 254821700 kB
After:
MemTotal: 259505132 kB
MemFree: 255410264 kB
Around 500MB more memory are free to use. The larger the machine, the
more memory saved.
* Memcached
We saw performance degradation when running Memcached benchmark with
rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure.
With this patchset we saw ops/sec is increased by around 3.5%, P99
latency is reduced by around 9.6%.
The gain mainly came from reduced kernel TLB misses. The kernel TLB
MPKI is reduced by 28.5%.
The benchmark data is now on par with rodata=on too.
* Disk encryption (dm-crypt) benchmark
Ran fio benchmark with the below command on a 128G ramdisk (ext4) with
disk encryption (by dm-crypt).
fio --directory=/data --random_generator=lfsr --norandommap \
--randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \
--ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \
--group_reporting --thread --name=iops-test-job --eta-newline=1 \
--size 100G
The IOPS is increased by 90% - 150% (the variance is high, but the worst
number of good case is around 90% more than the best number of bad
case). The bandwidth is increased and the avg clat is reduced
proportionally.
* Sequential file read
Read 100G file sequentially on XFS (xfs_io read with page cache
populated). The bandwidth is increased by 150%.
Co-developed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
e185c8a0d8 |
arm64: cputype: Add NVIDIA Olympus definitions
Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
52b49bd6de |
arm64: cputype: Remove duplicate Cortex-X1C definitions
We currently have duplicate definitions for ARM_CPU_PART_CORTEX_X1C and
MIDR_CORTEX_X1C as a result of commits:
58d245e03c324d08 ("arm64: cputype: Add Cortex-X1C definitions")
efe676a1a7554219 ("arm64: proton-pack: Add new CPUs 'k' values for branch mitigation")
Due to inconsistent sorting when adding entries, there was no textual
conflict between the two patches.
Delete the duplicate definitions added by the latter commit.
The definitions in general are largely (but not entirely) in order of
the MIDR_EL1.PartNum value rather than by CPU name, and the remaining
Cortex-X1C definitions appear later in the list.
For now I haven't sorted the remaining MIDR definitions to minimize
churn. I intend to perform some larger cleanup of these in the near
future which should supersede that anyhow.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
4a68408842 |
KVM: arm64: nv: Trap debug registers when in hyp context
In case you haven't realized it yet, the architecture is _slightly_ broken in the context of nested virt. Here we have another example of FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually affects execution at vEL2. Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this mess at the expense of unnecessarily trapping the breakpoint/watchpoint registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for obvious correctness to start. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> |