1
0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-01-17 20:10:49 +00:00

163 Commits

Author SHA1 Message Date
Sebastian Chlad
4cdde87d72 selftests: cgroup: Use values_close_report in test_cpu
Convert test_cpu to use the newly added values_close_report() helper
to print detailed diagnostics when a tolerance check fails. This
provides clearer insight into deviations while run in the CI.

Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 05:00:59 -10:00
Sebastian Chlad
3f9c60f4d3 selftests: cgroup: add values_close_report helper
Some cgroup selftests, such as test_cpu, occasionally fail by a very
small margin and if run in the CI context, it is useful to have detailed
diagnostic output to understand the deviation.

Introduce a values_close_report() helper which performs the same
comparison as values_close(), but prints detailed information when the
values differ beyond the allowed tolerance.

Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 05:00:49 -10:00
Michal Koutný
3b0dec689a selftests: cgroup: Make test_pids backwards compatible
The predicates in test expect event counting from 73e75e6fc352b
("cgroup/pids: Separate semantics of pids.events related to pids.max")
and the test would fail on older kernels. We want to have one version of
tests for all, so detect the feature and skip the test on old kernels.
(The test could even switch to check v1 semantics based on the flag but
keep it simple for now.)

Fixes: 9f34c566027b6 ("selftests: cgroup: Add basic tests for pids controller")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Tested-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-27 05:59:52 -10:00
Tiffany Yang
7b281a4582 cgroup: selftests: Add tests for freezer time
Test cgroup v2 freezer time stat. Freezer time accounting should
be independent of other cgroups in the hierarchy and should increase
iff a cgroup is CGRP_FREEZE (regardless of whether it reaches
CGRP_FROZEN).

Skip these tests on systems without freeze time accounting.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-22 07:50:52 -10:00
Shashank Balaji
954bacce36 selftests/cgroup: fix cpu.max tests
Current cpu.max tests (both the normal one and the nested one) are broken.

They setup cpu.max with 1000 us quota and the default period (100,000 us).
A cpu hog is run for a duration of 1s as per wall clock time. This corresponds
to 10 periods, hence an expected usage of 10,000 us. We want the measured
usage (as per cpu.stat) to be close to 10,000 us.

Previously, this approximate equality test was done by
`!values_close(usage_usec, expected_usage_usec, 95)`: if the absolute
difference between usage_usec and expected_usage_usec is greater than 95% of
their sum, then we pass. And expected_usage_usec was set to 1,000,000 us.
Mathematically, this translates to the following being true for pass:

	|usage - expected_usage| > (usage + expected_usage)*0.95

	If usage > expected_usage:
		usage - expected_usage > (usage + expected_usage)*0.95
		0.05*usage > 1.95*expected_usage
		usage > 39*expected_usage = 39s

	If usage < expected_usage:
		expected_usage - usage > (usage + expected_usage)*0.95
		0.05*expected_usage > 1.95*usage
		usage < 0.0256*expected_usage = 25,600 us

Combined,

	Pass if usage < 25,600 us or > 39 s,

which makes no sense given that all we need is for usage_usec to be close to
10,000 us.

Fix this by explicitly calcuating the expected usage duration based on the
configured quota, default period, and the duration, and compare usage_usec
and expected_usage_usec using values_close() with a 10% error margin.

Also, use snprintf to get the quota string to write to cpu.max instead of
hardcoding the quota, ensuring a single source of truth.

Remove the check comparing user_usec and expected_usage_usec, since on running
this test modified with printfs, it's seen that user_usec and usage_usec can
regularly exceed the theoretical expected_usage_usec:

	$ sudo ./test_cpu
	user: 10485, usage: 10485, expected: 10000
	ok 1 test_cpucg_max
	user: 11127, usage: 11127, expected: 10000
	ok 2 test_cpucg_max_nested
	$ sudo ./test_cpu
	user: 10286, usage: 10286, expected: 10000
	ok 1 test_cpucg_max
	user: 10404, usage: 11271, expected: 10000
	ok 2 test_cpucg_max_nested

Hence, a values_close() check of usage_usec and expected_usage_usec is
sufficient.

Fixes: a79906570f9646ae17 ("cgroup: Add test_cpucg_max_nested() testcase")
Fixes: 889ab8113ef1386c57 ("cgroup: Add test_cpucg_max() testcase")
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-17 08:12:19 -10:00
Sebastian Chlad
e07caae735 selftests: cgroup: Fix missing newline in test_zswap_writeback_one
Fixes malformed test output due to missing newline

Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-12 07:35:30 -10:00
Sebastian Chlad
c7d7713e36 selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup
The test_kmem_dead_cgroups test currently assumes that RCU and
memory reclaim will complete within 5 seconds. In some environments
this timeout may be insufficient, leading to spurious test failures.

This patch introduces max_time set to 20 which is then used in the
test. After 5th sec the debug message is printed to indicate the
cleanup is still ongoing.

In the system under test with 16 CPUs the original test was failing
most of the time and the cleanup time took usually approx. 6sec.
Further tests were conducted with and without do_rcu_barrier and the
results (respectively) are as follow:
quantiles 0  0.25  0.5  0.75  1
          1    2    3    8    20 (mean = 4.7667)
          3    5    8    8    20 (mean = 7.6667)

Acked-by: Michal Koutny <mkoutny@suse.com>
Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-12 07:34:49 -10:00
Michal Koutný
5da4f9db98 selftests: cgroup: Fix compilation on pre-cgroupns kernels
The test would be skipped because of nsdelegate, so the defined value is
not used (0 is always acceptable).

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17 08:13:25 -10:00
Michal Koutný
d74cd7864f selftests: cgroup: Optionally set up v1 environment
Use the missing mount of the unifier hierarchy as a hint of legacy
system and prepare our own named v1 hierarchy for tests.

The code is only in test_core.c and not cgroup_util.c because other
selftests are related to controllers which will be exposed on v2
hierarchy but named hierarchies are only v1 thing.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17 08:13:16 -10:00
Michal Koutný
dd7588e455 selftests: cgroup: Add support for named v1 hierarchies in test_core
This comes useful when using selftests from mainline on older
kernels/setups that still rely on cgroup v1.
The core tests that rely on v2 specific features are skipped, the
remaining ones are adjusted to work with a v1 hierarchy.

The expected output on v1 system:
	ok 1 # SKIP test_cgcore_internal_process_constraint
	ok 2 # SKIP test_cgcore_top_down_constraint_enable
	ok 3 # SKIP test_cgcore_top_down_constraint_disable
	ok 4 # SKIP test_cgcore_no_internal_process_constraint_on_threads
	ok 5 # SKIP test_cgcore_parent_becomes_threaded
	ok 6 # SKIP test_cgcore_invalid_domain
	ok 7 # SKIP test_cgcore_populated
	ok 8 test_cgcore_proc_migration
	ok 9 test_cgcore_thread_migration
	ok 10 test_cgcore_destroy
	ok 11 test_cgcore_lesser_euid_open
	ok 12 # SKIP test_cgcore_lesser_ns_open

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17 08:13:09 -10:00
Michal Koutný
0925275a17 selftests: cgroup_util: Add helpers for testing named v1 hierarchies
Non-functional change, the control variable will be wired in a separate
commit.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17 08:13:01 -10:00
Linus Torvalds
7f9039c524 Generic:
* Clean up locking of all vCPUs for a VM by using the *_nest_lock()
   family of functions, and move duplicated code to virt/kvm/.
   kernel/ patches acked by Peter Zijlstra.
 
 * Add MGLRU support to the access tracking perf test.
 
 ARM fixes:
 
 * Make the irqbypass hooks resilient to changes in the GSI<->MSI
   routing, avoiding behind stale vLPI mappings being left behind. The
   fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
   and nuking the vLPI mapping upon a routing change.
 
 * Close another VGIC race where vCPU creation races with VGIC
   creation, leading to in-flight vCPUs entering the kernel w/o private
   IRQs allocated.
 
 * Fix a build issue triggered by the recently added workaround for
   Ampere's AC04_CPU_23 erratum.
 
 * Correctly sign-extend the VA when emulating a TLBI instruction
   potentially targeting a VNCR mapping.
 
 * Avoid dereferencing a NULL pointer in the VGIC debug code, which can
   happen if the device doesn't have any mapping yet.
 
 s390:
 
 * Fix interaction between some filesystems and Secure Execution
 
 * Some cleanups and refactorings, preparing for an upcoming big series
 
 x86:
 
 * Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to
   fix a race between AP destroy and VMRUN.
 
 * Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM.
 
 * Refine and harden handling of spurious faults.
 
 * Add support for ALLOWED_SEV_FEATURES.
 
 * Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y.
 
 * Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features
   that utilize those bits.
 
 * Don't account temporary allocations in sev_send_update_data().
 
 * Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
 
 * Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
   SVM and VMX.
 
 * Advertise support to userspace for WRMSRNS and PREFETCHI.
 
 * Rescan I/O APIC routes after handling EOI that needed to be intercepted due
   to the old/previous routing, but not the new/current routing.
 
 * Add a module param to control and enumerate support for device posted
   interrupts.
 
 * Fix a potential overflow with nested virt on Intel systems running 32-bit kernels.
 
 * Flush shadow VMCSes on emergency reboot.
 
 * Add support for SNP to the various SEV selftests.
 
 * Add a selftest to verify fastops instructions via forced emulation.
 
 * Refine and optimize KVM's software processing of the posted interrupt bitmap, and share
   the harvesting code between KVM and the kernel's Posted MSI handler
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg9TjwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOUxQf7B7nnWqIKd7jSkGzSD6YsSX9TXktr
 2tJIOfWM3zNYg5GRCidg+m4Y5+DqQWd3Hi5hH2P9wUw7RNuOjOFsDe+y0VBr8ysE
 ve39t/yp+mYalNmHVFl8s3dBDgrIeGKiz+Wgw3zCQIBZ18rJE1dREhv37RlYZ3a2
 wSvuObe8sVpCTyKIowDs1xUi7qJUBoopMSuqfleSHawRrcgCpV99U8/KNFF5plLH
 7fXOBAHHniVCVc+mqQN2wxtVJDhST+U3TaU4GwlKy9Yevr+iibdOXffveeIgNEU4
 D6q1F2zKp6UdV3+p8hxyaTTbiCVDqsp9WOgY/0I/f+CddYn0WVZgOlR+ow==
 =mYFL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
  Generic:

   - Clean up locking of all vCPUs for a VM by using the *_nest_lock()
     family of functions, and move duplicated code to virt/kvm/. kernel/
     patches acked by Peter Zijlstra

   - Add MGLRU support to the access tracking perf test

  ARM fixes:

   - Make the irqbypass hooks resilient to changes in the GSI<->MSI
     routing, avoiding behind stale vLPI mappings being left behind. The
     fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
     and nuking the vLPI mapping upon a routing change

   - Close another VGIC race where vCPU creation races with VGIC
     creation, leading to in-flight vCPUs entering the kernel w/o
     private IRQs allocated

   - Fix a build issue triggered by the recently added workaround for
     Ampere's AC04_CPU_23 erratum

   - Correctly sign-extend the VA when emulating a TLBI instruction
     potentially targeting a VNCR mapping

   - Avoid dereferencing a NULL pointer in the VGIC debug code, which
     can happen if the device doesn't have any mapping yet

  s390:

   - Fix interaction between some filesystems and Secure Execution

   - Some cleanups and refactorings, preparing for an upcoming big
     series

  x86:

   - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
     to fix a race between AP destroy and VMRUN

   - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
     the VM

   - Refine and harden handling of spurious faults

   - Add support for ALLOWED_SEV_FEATURES

   - Add #VMGEXIT to the set of handlers special cased for
     CONFIG_RETPOLINE=y

   - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
     features that utilize those bits

   - Don't account temporary allocations in sev_send_update_data()

   - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
     Threshold

   - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
     IBPB, between SVM and VMX

   - Advertise support to userspace for WRMSRNS and PREFETCHI

   - Rescan I/O APIC routes after handling EOI that needed to be
     intercepted due to the old/previous routing, but not the
     new/current routing

   - Add a module param to control and enumerate support for device
     posted interrupts

   - Fix a potential overflow with nested virt on Intel systems running
     32-bit kernels

   - Flush shadow VMCSes on emergency reboot

   - Add support for SNP to the various SEV selftests

   - Add a selftest to verify fastops instructions via forced emulation

   - Refine and optimize KVM's software processing of the posted
     interrupt bitmap, and share the harvesting code between KVM and the
     kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  rtmutex_api: provide correct extern functions
  KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
  KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
  KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
  KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
  KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
  KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
  KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
  arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
  KVM: s390: Simplify and move pv code
  KVM: s390: Refactor and split some gmap helpers
  KVM: s390: Remove unneeded srcu lock
  s390: Remove unneeded includes
  s390/uv: Improve splitting of large folios that cannot be split while dirty
  s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
  s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
  rust: add helper for mutex_trylock
  RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
  KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
  x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
  ...
2025-06-02 12:24:58 -07:00
Sean Christopherson
38e1dd5781 cgroup: selftests: Add API to find root of specific controller
Add an API in the cgroups library to find the root of a specific
controller.  KVM selftests will use the API to find the memory controller.

Search for the controller on both v1 and v2 mounts, as KVM selftests'
usage will be completely oblivious of v1 versus v2.

Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-6-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16 11:45:15 -07:00
James Houghton
2c754a84ff cgroup: selftests: Move cgroup_util into its own library
KVM selftests will soon need to use some of the cgroup creation and
deletion functionality from cgroup_util.

Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250508184649.2576210-5-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16 11:45:15 -07:00
Sean Christopherson
3a7f9e518c cgroup: selftests: Move memcontrol specific helpers out of common cgroup_util.c
Move a handful of helpers out of cgroup_util.c and into test_memcontrol.c
that have nothing to with cgroups in general, in anticipation of making
cgroup_util.c a generic library that can be used by other selftests.

Make read_text() and write_text() non-static so test_memcontrol.c can
use them.

Signed-off-by: James Houghton <jthoughton@google.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-4-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16 11:45:14 -07:00
Waiman Long
d2def68ae0 selftests: memcg: increase error tolerance of child memory.current check in test_memcg_protection()
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests.  This function generates a set of parent/child
cgroups like:

  parent:  memory.min/low = 50M
  child 0: memory.min/low = 75M,  memory.current = 50M
  child 1: memory.min/low = 25M,  memory.current = 50M
  child 2: memory.min/low = 0,    memory.current = 50M

After applying memory pressure, the function expects the following actual
memory usages.

  parent:  memory.current ~= 50M
  child 0: memory.current ~= 29M
  child 1: memory.current ~= 21M
  child 2: memory.current ~= 0

In reality, the actual memory usages can differ quite a bit from the
expected values.  It uses an error tolerance of 10% with the
values_close() helper.

Both the test_memcg_min and test_memcg_low sub-tests can fail sporadically
because the actual memory usage exceeds the 10% error tolerance.  Below
are a sample of the usage data of the tests runs that fail.

  Child   Actual usage    Expected usage    %err
  -----   ------------    --------------    ----
    1       16990208         22020096      -12.9%
    1       17252352         22020096      -12.1%
    0       37699584         30408704      +10.7%
    1       14368768         22020096      -21.0%
    1       16871424         22020096      -13.2%

The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory reclaim
have certainly evolved quite a bit since then which may result in a bit
more run-to-run variation than previously expected.

Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure.  The tolerance is bigger for
child 1 because an upswing in child 0 corresponds to a smaller %err than a
similar downswing in child 1 due to the way %err is used in
values_close().

Before this patch, a 100 test runs of test_memcontrol produced the
following results:

     17 not ok 1 test_memcg_min
     22 not ok 2 test_memcg_low

After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs.  However, these tests may still fail
once in a while if the memory usage goes beyond the newly extended range.

Link: https://lkml.kernel.org/r/20250502010443.106022-3-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-13 16:28:07 -07:00
Waiman Long
fa6b8b5d9f selftests: memcg: allow low event with no memory.low and memory_recursiveprot on
Patch series "memcg: Fix test_memcg_min/low test failures", v8.

The test_memcontrol selftest consistently fails its test_memcg_low
sub-test (with memory_recursiveprot enabled) and sporadically fails its
test_memcg_min sub-test.  This patchset fixes the test_memcg_min and
test_memcg_low failures by adjusting the test_memcontrol selftest to fix
these test failures.


This patch (of 8):

The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that its 3rd test child cgroup which have a
memmory.low of 0 have low event count.  This happens when
memory_recursiveprot mount option is enabled which is the default setting
used by systemd to mount cgroup2 filesystem.

This issue was originally fixed by commit cdc69458a5f3 ("cgroup: account
for memory_recursiveprot in test_memcg_low()").  It was later reverted by
commit 1d09069f5313 ("selftests: memcg: expect no low events in
unprotected sibling") expecting the memory reclaim code would be fixed. 
However, it turns out the unprotected cgroup may still have some residual
effective memory.low protection depending on the memory.low settings in
its parent and its siblings.  As a result, low events may still be
triggered.

One way to fix the test failure is to revert the revert commit.  However,
Michal suggested that it might be better to ignore the low event count
with memory_recursiveprot enabled as low event may or may not happen
depending on the actual test configuration.

Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup
with memory_recursiveprot on.

The 4th child cgroup has no memory usage and so has an effective low of 0.
It has no low event count because the mem_cgroup_below_low() check in
shrink_node_memcgs() is skipped as mem_cgroup_below_min() returns true. 
If we ever change mem_cgroup_below_min() in such a way that it no longer
skips the no usage case, we will have to add code to explicitly skip it.

With this patch applied, the test_memcg_low sub-test finishes successfully
without failure in most cases.  Though both test_memcg_low and
test_memcg_min sub-tests may still fail occasionally if the memory.current
values fall outside of the expected ranges.

Link: https://lkml.kernel.org/r/20250502010443.106022-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20250502010443.106022-2-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-13 16:28:07 -07:00
Waiman Long
e8a457b735 selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
The current cgroup directory layout for running the partition state
transition tests is mainly suitable for testing local partitions as
well as with a mix of local and remote partitions. It is not that
suitable for doing extensive remote partition and nested remote/local
partition testing.

Add a new set of remote partition tests REMOTE_TEST_MATRIX with another
cgroup directory structure more tailored for remote partition testing
to provide better code coverage.

Also add a few new test cases as well as adjusting existig ones for
the original TEST_MATRIX.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:28:19 -10:00
Waiman Long
b2b2b4d058 selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
Cleaning up the test_cpuset_prs.sh script and restructure some of the
functions so that a new test matrix with a different cgroup directory
structure can be added in the next patch.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:28:05 -10:00
Waiman Long
65046b5e0a selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
Currently, ',' is used as the cgroup separator of the expected effective
CPUs and partition root states in the test matrix. However, ',' can be
part of the output of the cpuset.cpus*.effective and cpuset.cpus.isolated
files. Change the separator to '|' so that ',' can appear as part of
the expected values.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:27:49 -10:00
Waiman Long
f62a5d3936 cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
Currently, changes in exclusive CPUs are being handled in
remote_partition_check() by disabling conflicting remote partitions.
However, that may lead to results unexpected by the users. Fix
this problem by removing remote_partition_check() and making
update_cpumasks_hier() handle changes in descendant remote partitions
properly.

The compute_effective_exclusive_cpumask() function is enhanced to check
the exclusive_cpus and effective_xcpus from siblings and excluded them
in its effective exclusive CPUs computation and return a value to show if
there is any sibling conflicts.  This is somewhat like the cpu_exclusive
flag check in validate_change(). This is the initial step to enable us
to retire the use of cpu_exclusive flag in cgroup v2 in the future.

One of the tests in the TEST_MATRIX of the test_cpuset_prs.sh
script has to be updated due to changes in the way a child remote
partition root is being handled (updated instead of invalidation)
in update_cpumasks_hier().

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:25:56 -10:00
Bharadwaj Raju
fd07912411 selftests/cgroup: use bash in test_cpuset_v1_hp.sh
The script uses non-POSIX features like `[[` for conditionals and hence
does not work when run with a POSIX /bin/sh.

Change the shebang to /bin/bash instead, like the other tests in cgroup.

Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-04 09:36:54 -10:00
Waiman Long
9b496a8bbe cgroup/cpuset: Prevent leakage of isolated CPUs into sched domains
Isolated CPUs are not allowed to be used in a non-isolated partition.
The only exception is the top cpuset which is allowed to contain boot
time isolated CPUs.

Commit ccac8e8de99c ("cgroup/cpuset: Fix remote root partition creation
problem") introduces a simplified scheme of including only partition
roots in sched domain generation. However, it does not properly account
for this exception case. This can result in leakage of isolated CPUs
into a sched domain.

Fix it by making sure that isolated CPUs are excluded from the top
cpuset before generating sched domains.

Also update the way the boot time isolated CPUs are handled in
test_cpuset_prs.sh to make sure that those isolated CPUs are really
isolated instead of just skipping them in the tests.

Fixes: ccac8e8de99c ("cgroup/cpuset: Fix remote root partition creation problem")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-12-11 05:45:52 -10:00
Xiu Jianfeng
11312c86f9 selftests/cgroup: Fix compile error in test_cpu.c
When compiling the cgroup selftests with the following command:

make -C tools/testing/selftests/cgroup/

the compiler complains as below:

test_cpu.c: In function ‘test_cpucg_nice’:
test_cpu.c:284:39: error: incompatible type for argument 2 of ‘hog_cpus_timed’
  284 |                 hog_cpus_timed(cpucg, param);
      |                                       ^~~~~
      |                                       |
      |                                       struct cpu_hog_func_param
test_cpu.c:132:53: note: expected ‘void *’ but argument is of type ‘struct cpu_hog_func_param’
  132 | static int hog_cpus_timed(const char *cgroup, void *arg)
      |                                               ~~~~~~^~~

Fix it by passing the address of param to hog_cpus_timed().

Fixes: 2e82c0d4562a ("cgroup/rstat: Selftests for niced CPU statistics")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-14 15:05:39 -10:00
Joshua Hahn
2e82c0d456 cgroup/rstat: Selftests for niced CPU statistics
Creates a cgroup with a single nice CPU hog process running.
fork() is called to generate the nice process because un-nicing is
not possible (see man nice(3)). If fork() was not used to generate
the CPU hog, we would run the rest of the cgroup selftest suite as a
nice process.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-08 08:50:54 -10:00
Linus Torvalds
617a814f14 ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Mike Yuan
fd06ce2ce4 selftests: test_zswap: add test for hierarchical zswap.writeback
Ensure that zswap.writeback check goes up the cgroup tree, i.e.  is
hierarchical.  Create a subcgroup which has zswap.writeback set to 1, and
the upper hierarchy's restrictions shall apply.

Link: https://lkml.kernel.org/r/20240823162506.12117-2-me@yhndnzj.com
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:47 -07:00
David Finkel
d075bccec0 mm, memcg: cg2 memory{.swap,}.peak write tests
Extend two existing tests to cover extracting memory usage through the
newly mutable memory.peak and memory.swap.peak handlers.

In particular, make sure to exercise adding and removing watchers with
overlapping lifetimes so the less-trivial logic gets tested.

The new/updated tests attempt to detect a lack of the write handler by
fstat'ing the memory.peak and memory.swap.peak files and skip the tests if
that's the case.  Additionally, skip if the file doesn't exist at all.

[davidf@vimeo.com: update tests]
  Link: https://lkml.kernel.org/r/20240730231304.761942-3-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-3-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:53 -07:00
Chen Ridong
3f9319c691 cgroup/cpuset: add sefltest for cpuset v1
There is only hotplug test for cpuset v1, just add base read/write test
for cpuset v1.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30 10:00:17 -10:00
Waiman Long
43a17fcfcc selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
Since isolated CPUs can be reserved at boot time via the "isolcpus"
boot command line option, these pre-isolated CPUs may interfere with
testing done by test_cpuset_prs.sh.

With the previous commit that incorporates those boot time isolated CPUs
into "cpuset.cpus.isolated", we can check for those before testing is
started to make sure that there will be no interference.  Otherwise,
this test will be skipped if incorrect test failure can happen.

As "cpuset.cpus.isolated" is now available in a non cgroup_debug kernel,
we don't need to check for its existence anymore.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30 09:23:39 -10:00
Waiman Long
92841d6e23 selftest/cgroup: Add new test cases to test_cpuset_prs.sh
Add new test cases to test_cpuset_prs.sh to cover corner cases reported
in previous fix commits.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-05 10:57:42 -10:00
Linus Torvalds
fbc90c042c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Johannes Weiner
3a3b7fec39 mm: remove CONFIG_MEMCG_KMEM
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled.  It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade.  We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.

Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10 12:14:54 -07:00
Waiman Long
1c0be3f7b2 selftest/cgroup: Update test_cpuset_prs.sh to match changes
Unlike the list of isolated CPUs, it is not easy to programamatically
determine what sched domains are being created by the scheduler just
by examinng the data in various kernfs filesystems. The easiest way
to get this information is by enabling /sys/kernel/debug/sched/verbose
file to make those information displayed in the console. This is also
what the test_cpuset_prs.sh script is doing when the -v flag is given.

It is rather hard to fetch the data from the console and compare it to
the expected result. An easier way is to dump the expected sched-domain
information out to the console so that they can be visually compared
with the actual sched domain data. However, this have to be done manually
by visual inspection and so will only be done once in a while.

Moreover the preceding cpuset commits also change the cpuset behavior
requiring corresponding chanages in some test cases as well as new test
cases to test the newly added functionality.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19 07:37:38 -10:00
Waiman Long
43ee401453 selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot
The test robot reported two different problems when running the
test_cpuset_prs.sh test.

 # ./test_cpuset_prs.sh: line 106: echo: write error: Input/output error
 #  :
 # Effective cpus changed to 0-1,4-7 after test 4!

The write error is caused by writing to /dev/console. It looks like
some systems may not have /dev/console configured or in a writeable
state. Fix this by checking the existence of /dev/console before
attempting to write it.

After the completion of each test run, the script will check if the
cpuset state is reset back to the original state. That usually takes a
while to happen. The test script inserts some artificial delay to make
sure that the reset has completed. The current setting is about 80ms.
That may not be enough in some cases especially if the test system is
slow. Double it to 160ms to minimize the chance of this type of failure.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202406141712.dbbaa8fd-oliver.sang@intel.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19 07:37:37 -10:00
Michal Koutný
9f34c56602 selftests: cgroup: Add basic tests for pids controller
This commit adds (and wires in) new test program for checking basic pids
controller functionality -- restricting tasks in a cgroup and correct
event counting.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-26 08:45:10 -10:00
Michal Koutný
7055968559 selftests: cgroup: Lexicographic order in Makefile
This will reduce number of conflicts when modifying the lists.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-26 08:45:10 -10:00
Linus Torvalds
70ec81c2e2 linux_kselftest-next-6.10-rc1-fixes
This kselftest fixes update for Linux 6.10-rc1 consists of
 reverts framework change to add D_GNU_SOURCE to KHDR_INCLUDES
 to Makefile, lib.mk, and kselftest_harness.h and follow-on
 changes to cgroup and sgx test as they are causing build
 failures and warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmZLb3sACgkQCwJExA0N
 QxweSQ//WiDfJtObms7pDos6Q5DBnq2seGgK3+JUcAbBhNETPLzYBHSnvVnixRmi
 +QccdtTlPYE98vAk7GCt9zuBr82SV1z9VaNHVst+4jZeaSkmNFHGMS/ZSEE1x79g
 z4qL1C1eA0qPzmW17vgUv08+Lm5WKxt4RIsZEYWBi7tnu+MbfcaPaTbGZ85o/tG0
 IVZluGfVVaci7CQnpYhRD4hsJOWUC2H/YrvCNpjm6rGUFu+4Jm7rvHJPO+Ef6yCT
 tjKc5wReHRR2xT2+i4FoFjGqrAMsM/xakdiVpQWUlHUSm5/Nboqw7B73eNm/N2YL
 KvoJKBCuUPjmPKsoDGyUSORyx7j5RJnmdbGbFISOnZGpDVDkwSJVQDPJKeW9WqfS
 ONYVRRyjdift7BVcbFIpnde6TD7lCCv8sYrr2SpZv1wHybhMmrL0ojoTOOO93Xcd
 agpHGHjIOqmn+EGmptYeFEKKHr/GX+r6cQK0csmNhuSajHxeohwpzzs/v9gUisQ0
 BB5dC2Mo7wj1CES8DYKucFSSyCmZruzpyKtEz7Ndj1SAYzZqqYQV41kMDdlGYDE4
 qsHNVpA5DOFLOV8wOX59XjSu+UnJDjFyluWSaj66lJ1gA9qpYEifWD0l851LZ1YG
 hcCTd90Hbuh9XlOPmQEciVF98tJFrO0ho6vQu41jgjLPvJ4Y8H8=
 =Qisj
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Revert framework change to add D_GNU_SOURCE to KHDR_INCLUDES to
  Makefile, lib.mk, and kselftest_harness.h and follow-on changes to
  cgroup and sgx test as they are causing build failures and warnings"

* tag 'linux_kselftest-next-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  Revert "selftests/cgroup: Drop define _GNU_SOURCE"
  Revert "selftests/sgx: Include KHDR_INCLUDES in Makefile"
  Revert "selftests: Compile kselftest headers with -D_GNU_SOURCE"
2024-05-20 14:49:39 -07:00
Shuah Khan
a97853f25b Revert "selftests/cgroup: Drop define _GNU_SOURCE"
This reverts commit c1457d9aad5ee2feafcf85aa9a58ab50500159d2.

The framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h is reverted
as it is causing build failures and warnings.

Revert this change as this change depends on the framework
change.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-20 09:00:15 -06:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Edward Liaw
c1457d9aad selftests/cgroup: Drop define _GNU_SOURCE
_GNU_SOURCE is provided by lib.mk, so it should be dropped to prevent
redefinition warnings.

Signed-off-by: Edward Liaw <edliaw@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-13 07:02:36 -10:00
Usama Arif
158863e5d7 selftests: cgroup: add tests to verify the zswap writeback path
Attempt writeback with the below steps and check using memory.stat.zswpwb
if zswap writeback occurred:

1. Allocate memory.
2. Reclaim memory equal to the amount that was allocated in step 1.
   This will move it into zswap.
3. Save current zswap usage.
4. Move the memory allocated in step 1 back in from zswap.
5. Set zswap.max to half the amount that was recorded in step 3.
6. Attempt to reclaim memory equal to the amount that was allocated,
   this will either trigger writeback if it's enabled, or reclamation
   will fail if writeback is disabled as there isn't enough zswap
   space.

Link: https://lkml.kernel.org/r/20240508171359.1545744-1-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Suggested-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11 15:41:36 -07:00
Usama Arif
e6b331ab0a selftests: cgroup: remove redundant enabling of memory controller
Memory controller is already enabled in main which invokes the test, hence
this does not need to be done in test_no_kmem_bypass.

Link: https://lkml.kernel.org/r/20240502200529.4193651-2-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11 15:41:35 -07:00
John Hubbard
8f6d24a5db selftests/cgroup: fix uninitialized variables in test_zswap.c
First of all, in order to build with clang at all, one must first apply
Valentin Obst's build fix for LLVM [1]. Once that is done, then when
building with clang, via:

    make LLVM=1 -C tools/testing/selftests

...clang finds and warning about some uninitialized variables. Fix these
by initializing them.

[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-03 09:06:10 -10:00
John Hubbard
3309ca6f47 selftests/cgroup: cpu_hogger init: use {} instead of {NULL}
First of all, in order to build with clang at all, one must first apply
Valentin Obst's build fix for LLVM [1]. Once that is done, then when
building with clang, via:

    make LLVM=1 -C tools/testing/selftests

...clang generates warning here, because struct cpu_hogger has multiple
fields, and the code is initializing an array of these structs, and it
is incorrect to specify a single NULL value as the initializer.

Fix this by initializing with {}, so that the compiler knows to use
default initializer values for all fields in each array entry.

[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-03 09:06:09 -10:00
John Hubbard
0515089418 selftests/cgroup: fix clang warnings: uninitialized fd variable
First of all, in order to build with clang at all, one must first apply
Valentin Obst's build fix for LLVM [1]. Once that is done, then when
building with clang, via:

    make LLVM=1 -C tools/testing/selftests

...clang warns about fd being used uninitialized, in
test_memcg_reclaim()'s error handling path.

Fix this by initializing fd to -1.

[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-03 09:06:09 -10:00
John Hubbard
1da2363228 selftests/cgroup: fix clang build failures for abs() calls
First of all, in order to build with clang at all, one must first apply
Valentin Obst's build fix for LLVM [1]. Once that is done, then when
building with clang, via:

    make LLVM=1 -C tools/testing/selftests

...clang is pickier than gcc, about which version of abs(3) to call,
depending on the argument type:

   int abs(int j);
   long labs(long j);
   long long llabs(long long j);

...and this is causing both build failures and warnings, when running:

    make LLVM=1 -C tools/testing/selftests

Fix this by calling labs() in value_close(), because the arguments are
unambiguously "long" type.

[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-03 09:06:09 -10:00
Waiman Long
812c5945bd cgroup/cpuset: Add test_cpuset_v1_hp.sh
Add a simple test to verify that an empty v1 cpuset will force its tasks
to be moved to an ancestor node. It is based on the test case documented
in commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection in
cpuset_write_resmask()").

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-08 07:39:38 -10:00
Tianchen Ding
4793cb599b selftests: cgroup: skip test_cgcore_lesser_ns_open when cgroup2 mounted without nsdelegate
The test case test_cgcore_lesser_ns_open only tasks effect when cgroup2
is mounted with "nsdelegate" mount option. If it misses this option, or
is remounted without "nsdelegate", the test case will fail. For example,
running bpf/test_cgroup_storage first, and then run cgroup/test_core will
fail on test_cgcore_lesser_ns_open. Skip it if "nsdelegate" is not
detected in cgroup2 mount options.

Fixes: bf35a7879f1d ("selftests: cgroup: Test open-time cgroup namespace usage for migration checks")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-03 09:42:13 -10:00
Nhat Pham
b93c28ff72 selftests: add zswapin and no zswap tests
Add a selftest to cover the zswapin code path, allocating more memory than
the cgroup limit to trigger swapout/zswapout, then reading the pages back
in memory several times.  This is inspired by a recently encountered
kernel crash on the zswapin path in our internal kernel, which went
undetected because of a lack of test coverage for this path.

Add a selftest to verify that when memory.zswap.max = 0, no pages can go
to the zswap pool for the cgroup.

[nphamcs@gmail.com: remove redundant comment, add success checks]
  Link: https://lkml.kernel.org/r/20240222043132.616320-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20240205225608.3083251-4-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Rik van Riel <riel@surriel.com>
Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:27:16 -08:00