1
0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-01-11 17:10:13 +00:00

RCU pull request for v6.18

This pull request contains the following branches, non-octopus merged:
 
 Documentation updates:
 
   - Update whatisRCU.rst and checklist.rst for recent RCU API additions.
 
   - Fix RCU documentation formatting and typos.
 
   - Replace dead Ottawa Linux Symposium links in RTFP.txt.
 
 Miscellaneous RCU updates:
 
   - Document that rcu_barrier() hurries RCU_LAZY callbacks.
 
   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler().
 
   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly.
 
   - Make initial set of changes to accommodate upcoming system_percpu_wq
     changes.
 
 SRCU updates:
 
   - Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
     including adding guards.
 
   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast().
 
   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers.
 
   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed().
 
 Torture-test updates:
 
   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary.
 
   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels without
     doing non-debug kernels.
 
   - Fix a number of false-positive diagnostics that were being triggered
     by rcutorture starting before boot completed.  Running multiple
     near-CPU-bound rcutorture processes when there is only the boot CPU
     is after all a bit excessive.
 
   - Substitute kcalloc() for kzalloc().
 
   - Remove a redundant kfree() and NULL out kfree()ed objects.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
 7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
 hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
 pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
 3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
 CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
 nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
 XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
 LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
 OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
 w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
 LSqHR8QGSIpNmQ==
 =FEf1
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Paul McKenney:
 "Documentation updates:

   - Update whatisRCU.rst and checklist.rst for recent RCU API additions

   - Fix RCU documentation formatting and typos

   - Replace dead Ottawa Linux Symposium links in RTFP.txt

  Miscellaneous RCU updates:

   - Document that rcu_barrier() hurries RCU_LAZY callbacks

   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler()

   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly

   - Make initial set of changes to accommodate upcoming
     system_percpu_wq changes

  SRCU updates:

   - Create an srcu_read_lock_fast_notrace() for eventual use in
     tracing, including adding guards

   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast()

   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers

   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed()

  Torture-test updates:

   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary

   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels
     without doing non-debug kernels

   - Fix a number of false-positive diagnostics that were being
     triggered by rcutorture starting before boot completed. Running
     multiple near-CPU-bound rcutorture processes when there is only the
     boot CPU is after all a bit excessive

   - Substitute kcalloc() for kzalloc()

   - Remove a redundant kfree() and NULL out kfree()ed objects"

* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  rcu: WQ_UNBOUND added to sync_wq workqueue
  rcu: WQ_PERCPU added to alloc_workqueue users
  rcu: replace use of system_wq with system_percpu_wq
  refperf: Set reader_tasks to NULL after kfree()
  refperf: Remove redundant kfree() after torture_stop_kthread()
  srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
  srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
  srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
  rculist: move list_for_each_rcu() to where it belongs
  refscale: Use kcalloc() instead of kzalloc()
  rcutorture: Use kcalloc() instead of kzalloc()
  docs: rcu: Replace multiple dead OLS links in RTFP.txt
  doc: Fix typo in RCU's torture.rst documentation
  Documentation: RCU: Retitle toctree index
  Documentation: RCU: Reduce toctree depth
  Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
  rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
  doc: Add RCU guards to checklist.rst
  doc: Update whatisRCU.rst for recent RCU API additions
  rcutorture: Delay forward-progress testing until boot completes
  ...
This commit is contained in:
Linus Torvalds 2025-10-04 11:28:45 -07:00
commit 67da125e30
21 changed files with 315 additions and 128 deletions

View File

@ -1973,9 +1973,7 @@ code, and the FQS loop, all of which refer to or modify this bookkeeping.
Note that grace period initialization (rcu_gp_init()) must carefully sequence
CPU hotplug scanning with grace period state changes. For example, the
following race could occur in rcu_gp_init() if rcu_seq_start() were to happen
after the CPU hotplug scanning.
.. code-block:: none
after the CPU hotplug scanning::
CPU0 (rcu_gp_init) CPU1 CPU2
--------------------- ---- ----
@ -2008,22 +2006,22 @@ after the CPU hotplug scanning.
kfree(r1);
r2 = *r0; // USE-AFTER-FREE!
By incrementing gp_seq first, CPU1's RCU read-side critical section
By incrementing ``gp_seq`` first, CPU1's RCU read-side critical section
is guaranteed to not be missed by CPU2.
**Concurrent Quiescent State Reporting for Offline CPUs**
Concurrent Quiescent State Reporting for Offline CPUs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RCU must ensure that CPUs going offline report quiescent states to avoid
blocking grace periods. This requires careful synchronization to handle
race conditions
**Race condition causing Offline CPU to hang GP**
Race condition causing Offline CPU to hang GP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A race between CPU offlining and new GP initialization (gp_init) may occur
because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
release the `rcu_node` lock to wake the RCU grace-period kthread:
.. code-block:: none
A race between CPU offlining and new GP initialization (gp_init()) may occur
because rcu_report_qs_rnp() in rcutree_report_cpu_dead() must temporarily
release the ``rcu_node`` lock to wake the RCU grace-period kthread::
CPU1 (going offline) CPU0 (GP kthread)
-------------------- -----------------
@ -2044,15 +2042,14 @@ release the `rcu_node` lock to wake the RCU grace-period kthread:
// Reacquire lock (but too late)
rnp->qsmaskinitnext &= ~mask // Finally clears bit
Without `ofl_lock`, the new grace period includes the offline CPU and waits
Without ``ofl_lock``, the new grace period includes the offline CPU and waits
forever for its quiescent state causing a GP hang.
**A solution with ofl_lock**
A solution with ofl_lock
^^^^^^^^^^^^^^^^^^^^^^^^
The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
.. code-block:: none
The ``ofl_lock`` (offline lock) prevents rcu_gp_init() from running during
the vulnerable window when rcu_report_qs_rnp() has released ``rnp->lock``::
CPU0 (rcu_gp_init) CPU1 (rcutree_report_cpu_dead)
------------------ ------------------------------
@ -2065,21 +2062,20 @@ the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
} // But snapshot already taken
**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After the first loop takes an atomic snapshot of online CPUs, as shown above,
the second loop in `rcu_gp_init()` detects CPUs that went offline between
releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
crucial because:
the second loop in rcu_gp_init() detects CPUs that went offline between
releasing ``ofl_lock`` and acquiring the per-node ``rnp->lock``.
This detection is crucial because:
1. The CPU might have gone offline after the snapshot but before the second loop
2. The offline CPU cannot report its own QS if it's already dead
3. Without this detection, the grace period would wait forever for CPUs that
are now offline.
The second loop performs this detection safely:
.. code-block:: none
The second loop performs this detection safely::
rcu_for_each_node_breadth_first(rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@ -2093,10 +2089,10 @@ The second loop performs this detection safely:
}
This approach ensures atomicity: quiescent state reporting for offline CPUs
happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
never both and never neither. The `rnp->lock` held throughout the sequence
prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
clearing `qsmaskinitnext`, ensuring mutual exclusion.
happens either in rcu_gp_init() (second loop) or in rcutree_report_cpu_dead(),
never both and never neither. The ``rnp->lock`` held throughout the sequence
prevents races - rcutree_report_cpu_dead() also acquires this lock when
clearing ``qsmaskinitnext``, ensuring mutual exclusion.
Scheduler and RCU
~~~~~~~~~~~~~~~~~

View File

@ -641,7 +641,7 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni"
,Month="July"
,Year="2001"
,note="Available:
\url{http://www.linuxsymposium.org/2001/abstracts/readcopy.php}
\url{https://kernel.org/doc/ols/2001/read-copy.pdf}
\url{http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf}
[Viewed June 23, 2004]"
,annotation={
@ -1480,7 +1480,7 @@ Suparna Bhattacharya"
,Year="2006"
,pages="v2 123-138"
,note="Available:
\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184}
\url{https://kernel.org/doc/ols/2006/ols2006v2-pages-131-146.pdf}
\url{http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf}
[Viewed January 1, 2007]"
,annotation={
@ -1511,7 +1511,7 @@ Canis Rufus and Zoicon5 and Anome and Hal Eisen"
,Year="2006"
,pages="v2 249-254"
,note="Available:
\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184}
\url{https://kernel.org/doc/ols/2006/ols2006v2-pages-249-262.pdf}
[Viewed January 11, 2009]"
,annotation={
Uses RCU-protected radix tree for a lockless page cache.

View File

@ -69,7 +69,13 @@ over a rather long period of time, but improvements are always welcome!
Explicit disabling of preemption (preempt_disable(), for example)
can serve as rcu_read_lock_sched(), but is less readable and
prevents lockdep from detecting locking issues. Acquiring a
spinlock also enters an RCU read-side critical section.
raw spinlock also enters an RCU read-side critical section.
The guard(rcu)() and scoped_guard(rcu) primitives designate
the remainder of the current scope or the next statement,
respectively, as the RCU read-side critical section. Use of
these guards can be less error-prone than rcu_read_lock(),
rcu_read_unlock(), and friends.
Please note that you *cannot* rely on code known to be built
only in non-preemptible kernels. Such code can and will break,
@ -405,9 +411,11 @@ over a rather long period of time, but improvements are always welcome!
13. Unlike most flavors of RCU, it *is* permissible to block in an
SRCU read-side critical section (demarked by srcu_read_lock()
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
Please note that if you don't need to sleep in read-side critical
sections, you should be using RCU rather than SRCU, because RCU
is almost always faster and easier to use than is SRCU.
As with RCU, guard(srcu)() and scoped_guard(srcu) forms are
available, and often provide greater ease of use. Please note
that if you don't need to sleep in read-side critical sections,
you should be using RCU rather than SRCU, because RCU is almost
always faster and easier to use than is SRCU.
Also unlike other forms of RCU, explicit initialization and
cleanup is required either at build time via DEFINE_SRCU()
@ -443,10 +451,13 @@ over a rather long period of time, but improvements are always welcome!
real-time workloads than is synchronize_rcu_expedited().
It is also permissible to sleep in RCU Tasks Trace read-side
critical section, which are delimited by rcu_read_lock_trace() and
rcu_read_unlock_trace(). However, this is a specialized flavor
of RCU, and you should not use it without first checking with
its current users. In most cases, you should instead use SRCU.
critical section, which are delimited by rcu_read_lock_trace()
and rcu_read_unlock_trace(). However, this is a specialized
flavor of RCU, and you should not use it without first checking
with its current users. In most cases, you should instead
use SRCU. As with RCU and SRCU, guard(rcu_tasks_trace)() and
scoped_guard(rcu_tasks_trace) are available, and often provide
greater ease of use.
Note that rcu_assign_pointer() relates to SRCU just as it does to
other forms of RCU, but instead of rcu_dereference() you should

View File

@ -1,13 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
.. _rcu_concepts:
.. _rcu_handbook:
============
RCU concepts
RCU Handbook
============
.. toctree::
:maxdepth: 3
:maxdepth: 2
checklist
lockdep

View File

@ -344,7 +344,7 @@ painstaking and error-prone.
And this is why the kvm-remote.sh script exists.
If you the following command works::
If the following command works::
ssh system0 date
@ -364,7 +364,7 @@ systems must come first.
The kvm.sh ``--dryrun scenarios`` argument is useful for working out
how many scenarios may be run in one batch across a group of systems.
You can also re-run a previous remote run in a manner similar to kvm.sh:
You can also re-run a previous remote run in a manner similar to kvm.sh::
kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \

View File

@ -1021,32 +1021,41 @@ RCU list traversal::
list_entry_rcu
list_entry_lockless
list_first_entry_rcu
list_first_or_null_rcu
list_tail_rcu
list_next_rcu
list_next_or_null_rcu
list_for_each_entry_rcu
list_for_each_entry_continue_rcu
list_for_each_entry_from_rcu
list_first_or_null_rcu
list_next_or_null_rcu
list_for_each_entry_lockless
hlist_first_rcu
hlist_next_rcu
hlist_pprev_rcu
hlist_for_each_entry_rcu
hlist_for_each_entry_rcu_notrace
hlist_for_each_entry_rcu_bh
hlist_for_each_entry_from_rcu
hlist_for_each_entry_continue_rcu
hlist_for_each_entry_continue_rcu_bh
hlist_nulls_first_rcu
hlist_nulls_next_rcu
hlist_nulls_for_each_entry_rcu
hlist_nulls_for_each_entry_safe
hlist_bl_first_rcu
hlist_bl_for_each_entry_rcu
RCU pointer/list update::
rcu_assign_pointer
rcu_replace_pointer
INIT_LIST_HEAD_RCU
list_add_rcu
list_add_tail_rcu
list_del_rcu
list_replace_rcu
list_splice_init_rcu
list_splice_tail_init_rcu
hlist_add_behind_rcu
hlist_add_before_rcu
hlist_add_head_rcu
@ -1054,34 +1063,53 @@ RCU pointer/list update::
hlist_del_rcu
hlist_del_init_rcu
hlist_replace_rcu
list_splice_init_rcu
list_splice_tail_init_rcu
hlist_nulls_del_init_rcu
hlist_nulls_del_rcu
hlist_nulls_add_head_rcu
hlist_nulls_add_tail_rcu
hlist_nulls_add_fake
hlists_swap_heads_rcu
hlist_bl_add_head_rcu
hlist_bl_del_init_rcu
hlist_bl_del_rcu
hlist_bl_set_first_rcu
RCU::
Critical sections Grace period Barrier
Critical sections Grace period Barrier
rcu_read_lock synchronize_net rcu_barrier
rcu_read_unlock synchronize_rcu
rcu_dereference synchronize_rcu_expedited
rcu_read_lock_held call_rcu
rcu_dereference_check kfree_rcu
rcu_dereference_protected
rcu_read_lock synchronize_net rcu_barrier
rcu_read_unlock synchronize_rcu
guard(rcu)() synchronize_rcu_expedited
scoped_guard(rcu) synchronize_rcu_mult
rcu_dereference call_rcu
rcu_dereference_check call_rcu_hurry
rcu_dereference_protected kfree_rcu
rcu_read_lock_held kvfree_rcu
rcu_read_lock_any_held kfree_rcu_mightsleep
rcu_pointer_handoff cond_synchronize_rcu
unrcu_pointer cond_synchronize_rcu_full
cond_synchronize_rcu_expedited
cond_synchronize_rcu_expedited_full
get_completed_synchronize_rcu
get_completed_synchronize_rcu_full
get_state_synchronize_rcu
get_state_synchronize_rcu_full
poll_state_synchronize_rcu
poll_state_synchronize_rcu_full
same_state_synchronize_rcu
same_state_synchronize_rcu_full
start_poll_synchronize_rcu
start_poll_synchronize_rcu_full
start_poll_synchronize_rcu_expedited
start_poll_synchronize_rcu_expedited_full
bh::
Critical sections Grace period Barrier
rcu_read_lock_bh call_rcu rcu_barrier
rcu_read_unlock_bh synchronize_rcu
[local_bh_disable] synchronize_rcu_expedited
rcu_read_lock_bh [Same as RCU] [Same as RCU]
rcu_read_unlock_bh
[local_bh_disable]
[and friends]
rcu_dereference_bh
rcu_dereference_bh_check
@ -1092,9 +1120,9 @@ sched::
Critical sections Grace period Barrier
rcu_read_lock_sched call_rcu rcu_barrier
rcu_read_unlock_sched synchronize_rcu
[preempt_disable] synchronize_rcu_expedited
rcu_read_lock_sched [Same as RCU] [Same as RCU]
rcu_read_unlock_sched
[preempt_disable]
[and friends]
rcu_read_lock_sched_notrace
rcu_read_unlock_sched_notrace
@ -1104,46 +1132,104 @@ sched::
rcu_read_lock_sched_held
RCU: Initialization/cleanup/ordering::
RCU_INIT_POINTER
RCU_INITIALIZER
RCU_POINTER_INITIALIZER
init_rcu_head
destroy_rcu_head
init_rcu_head_on_stack
destroy_rcu_head_on_stack
SLAB_TYPESAFE_BY_RCU
RCU: Quiescents states and control::
cond_resched_tasks_rcu_qs
rcu_all_qs
rcu_softirq_qs_periodic
rcu_end_inkernel_boot
rcu_expedite_gp
rcu_gp_is_expedited
rcu_unexpedite_gp
rcu_cpu_stall_reset
rcu_head_after_call_rcu
rcu_is_watching
RCU-sync primitive::
rcu_sync_is_idle
rcu_sync_init
rcu_sync_enter
rcu_sync_exit
rcu_sync_dtor
RCU-Tasks::
Critical sections Grace period Barrier
Critical sections Grace period Barrier
N/A call_rcu_tasks rcu_barrier_tasks
N/A call_rcu_tasks rcu_barrier_tasks
synchronize_rcu_tasks
RCU-Tasks-Rude::
Critical sections Grace period Barrier
Critical sections Grace period Barrier
N/A N/A
synchronize_rcu_tasks_rude
N/A synchronize_rcu_tasks_rude rcu_barrier_tasks_rude
call_rcu_tasks_rude
RCU-Tasks-Trace::
Critical sections Grace period Barrier
Critical sections Grace period Barrier
rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
rcu_read_unlock_trace synchronize_rcu_tasks_trace
guard(rcu_tasks_trace)()
scoped_guard(rcu_tasks_trace)
SRCU list traversal::
list_for_each_entry_srcu
hlist_for_each_entry_srcu
SRCU::
Critical sections Grace period Barrier
Critical sections Grace period Barrier
srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock synchronize_srcu
srcu_dereference synchronize_srcu_expedited
srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock synchronize_srcu
srcu_read_lock_fast synchronize_srcu_expedited
srcu_read_unlock_fast get_state_synchronize_srcu
srcu_read_lock_nmisafe start_poll_synchronize_srcu
srcu_read_unlock_nmisafe start_poll_synchronize_srcu_expedited
srcu_read_lock_notrace poll_state_synchronize_srcu
srcu_read_unlock_notrace
srcu_down_read
srcu_up_read
srcu_down_read_fast
srcu_up_read_fast
guard(srcu)()
scoped_guard(srcu)
srcu_read_lock_held
srcu_dereference
srcu_dereference_check
srcu_dereference_notrace
srcu_read_lock_held
SRCU: Initialization/cleanup::
SRCU: Initialization/cleanup/ordering::
DEFINE_SRCU
DEFINE_STATIC_SRCU
init_srcu_struct
cleanup_srcu_struct
smp_mb__after_srcu_read_unlock
All: lockdep-checked RCU utility APIs::

View File

@ -708,16 +708,6 @@ static inline void list_splice_tail_init(struct list_head *list,
#define list_for_each(pos, head) \
for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
/**
* list_for_each_rcu - Iterate over a list in an RCU-safe fashion
* @pos: the &struct list_head to use as a loop cursor.
* @head: the head for your list.
*/
#define list_for_each_rcu(pos, head) \
for (pos = rcu_dereference((head)->next); \
!list_is_head(pos, (head)); \
pos = rcu_dereference(pos->next))
/**
* list_for_each_continue - continue iteration over a list
* @pos: the &struct list_head to use as a loop cursor.

View File

@ -42,6 +42,16 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
*/
#define list_bidir_prev_rcu(list) (*((struct list_head __rcu **)(&(list)->prev)))
/**
* list_for_each_rcu - Iterate over a list in an RCU-safe fashion
* @pos: the &struct list_head to use as a loop cursor.
* @head: the head for your list.
*/
#define list_for_each_rcu(pos, head) \
for (pos = rcu_dereference((head)->next); \
!list_is_head(pos, (head)); \
pos = rcu_dereference(pos->next))
/**
* list_tail_rcu - returns the prev pointer of the head of the list
* @head: the head of the list

View File

@ -275,12 +275,27 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
{
struct srcu_ctr __percpu *retval;
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast().");
srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
retval = __srcu_read_lock_fast(ssp);
rcu_try_lock_acquire(&ssp->dep_map);
return retval;
}
/*
* Used by tracing, cannot be traced and cannot call lockdep.
* See srcu_read_lock_fast() for more information.
*/
static inline struct srcu_ctr __percpu *srcu_read_lock_fast_notrace(struct srcu_struct *ssp)
__acquires(ssp)
{
struct srcu_ctr __percpu *retval;
srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
retval = __srcu_read_lock_fast(ssp);
return retval;
}
/**
* srcu_down_read_fast - register a new reader for an SRCU-protected structure.
* @ssp: srcu_struct in which to register the new reader.
@ -295,6 +310,7 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
static inline struct srcu_ctr __percpu *srcu_down_read_fast(struct srcu_struct *ssp) __acquires(ssp)
{
WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi());
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_down_read_fast().");
srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
return __srcu_read_lock_fast(ssp);
}
@ -389,6 +405,18 @@ static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ct
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
srcu_lock_release(&ssp->dep_map);
__srcu_read_unlock_fast(ssp, scp);
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
}
/*
* Used by tracing, cannot be traced and cannot call lockdep.
* See srcu_read_unlock_fast() for more information.
*/
static inline void srcu_read_unlock_fast_notrace(struct srcu_struct *ssp,
struct srcu_ctr __percpu *scp) __releases(ssp)
{
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
__srcu_read_unlock_fast(ssp, scp);
}
/**
@ -405,6 +433,7 @@ static inline void srcu_up_read_fast(struct srcu_struct *ssp, struct srcu_ctr __
WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi());
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
__srcu_read_unlock_fast(ssp, scp);
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_up_read_fast().");
}
/**
@ -486,4 +515,9 @@ DEFINE_LOCK_GUARD_1(srcu_fast, struct srcu_struct,
srcu_read_unlock_fast(_T->lock, _T->scp),
struct srcu_ctr __percpu *scp)
DEFINE_LOCK_GUARD_1(srcu_fast_notrace, struct srcu_struct,
_T->scp = srcu_read_lock_fast_notrace(_T->lock),
srcu_read_unlock_fast_notrace(_T->lock, _T->scp),
struct srcu_ctr __percpu *scp)
#endif

View File

@ -232,23 +232,40 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
* srcu_read_unlock_fast().
*
* Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
* critical sections either because they disables interrupts, because they
* are a single instruction, or because they are a read-modify-write atomic
* operation, depending on the whims of the architecture.
* critical sections either because they disables interrupts, because
* they are a single instruction, or because they are read-modify-write
* atomic operations, depending on the whims of the architecture.
* This matters because the SRCU-fast grace-period mechanism uses either
* synchronize_rcu() or synchronize_rcu_expedited(), that is, RCU,
* *not* SRCU, in order to eliminate the need for the read-side smp_mb()
* invocations that are used by srcu_read_lock() and srcu_read_unlock().
* The __srcu_read_unlock_fast() function also relies on this same RCU
* (again, *not* SRCU) trick to eliminate the need for smp_mb().
*
* The key point behind this RCU trick is that if any part of a given
* RCU reader precedes the beginning of a given RCU grace period, then
* the entirety of that RCU reader and everything preceding it happens
* before the end of that same RCU grace period. Similarly, if any part
* of a given RCU reader follows the end of a given RCU grace period,
* then the entirety of that RCU reader and everything following it
* happens after the beginning of that same RCU grace period. Therefore,
* the operations labeled Y in __srcu_read_lock_fast() and those labeled Z
* in __srcu_read_unlock_fast() are ordered against the corresponding SRCU
* read-side critical section from the viewpoint of the SRCU grace period.
* This is all the ordering that is required, hence no calls to smp_mb().
*
* This means that __srcu_read_lock_fast() is not all that fast
* on architectures that support NMIs but do not supply NMI-safe
* implementations of this_cpu_inc().
*/
static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp)
static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct srcu_struct *ssp)
{
struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast().");
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
this_cpu_inc(scp->srcu_locks.counter); /* Y */
this_cpu_inc(scp->srcu_locks.counter); // Y, and implicit RCU reader.
else
atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); /* Z */
atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); // Y, and implicit RCU reader.
barrier(); /* Avoid leaking the critical section. */
return scp;
}
@ -259,23 +276,17 @@ static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct
* different CPU than that which was incremented by the corresponding
* srcu_read_lock_fast(), but it must be within the same task.
*
* Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
* critical sections either because they disables interrupts, because they
* are a single instruction, or because they are a read-modify-write atomic
* operation, depending on the whims of the architecture.
*
* This means that __srcu_read_unlock_fast() is not all that fast
* on architectures that support NMIs but do not supply NMI-safe
* implementations of this_cpu_inc().
* Please see the __srcu_read_lock_fast() function's header comment for
* information on implicit RCU readers and NMI safety.
*/
static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
static inline void notrace
__srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
{
barrier(); /* Avoid leaking the critical section. */
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
this_cpu_inc(scp->srcu_unlocks.counter); /* Z */
this_cpu_inc(scp->srcu_unlocks.counter); // Z, and implicit RCU reader.
else
atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); /* Z */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); // Z, and implicit RCU reader.
}
void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor);

View File

@ -14,6 +14,7 @@
#include <linux/mutex.h>
#include <linux/page_counter.h>
#include <linux/parser.h>
#include <linux/rculist.h>
#include <linux/slab.h>
struct dmem_cgroup_region {

View File

@ -1528,7 +1528,7 @@ static void do_rtws_sync(struct torture_random_state *trsp, void (*sync)(void))
static int
rcu_torture_writer(void *arg)
{
bool boot_ended;
bool booting_still = false;
bool can_expedite = !rcu_gp_is_expedited() && !rcu_gp_is_normal();
unsigned long cookie;
struct rcu_gp_oldstate cookie_full;
@ -1539,6 +1539,7 @@ rcu_torture_writer(void *arg)
struct rcu_gp_oldstate gp_snap1_full;
int i;
int idx;
unsigned long j;
int oldnice = task_nice(current);
struct rcu_gp_oldstate *rgo = NULL;
int rgo_size = 0;
@ -1571,16 +1572,26 @@ rcu_torture_writer(void *arg)
return 0;
}
if (cur_ops->poll_active > 0) {
ulo = kzalloc(cur_ops->poll_active * sizeof(ulo[0]), GFP_KERNEL);
ulo = kcalloc(cur_ops->poll_active, sizeof(*ulo), GFP_KERNEL);
if (!WARN_ON(!ulo))
ulo_size = cur_ops->poll_active;
}
if (cur_ops->poll_active_full > 0) {
rgo = kzalloc(cur_ops->poll_active_full * sizeof(rgo[0]), GFP_KERNEL);
rgo = kcalloc(cur_ops->poll_active_full, sizeof(*rgo), GFP_KERNEL);
if (!WARN_ON(!rgo))
rgo_size = cur_ops->poll_active_full;
}
// If the system is still booting, let it finish.
j = jiffies;
while (!torture_must_stop() && !rcu_inkernel_boot_has_ended()) {
booting_still = true;
schedule_timeout_interruptible(HZ);
}
if (booting_still)
pr_alert("%s" TORTURE_FLAG " Waited %lu jiffies for boot to complete.\n",
torture_type, jiffies - j);
do {
rcu_torture_writer_state = RTWS_FIXED_DELAY;
torture_hrtimeout_us(500, 1000, &rand);
@ -1769,13 +1780,11 @@ rcu_torture_writer(void *arg)
!rcu_gp_is_normal();
}
rcu_torture_writer_state = RTWS_STUTTER;
boot_ended = rcu_inkernel_boot_has_ended();
stutter_waited = stutter_wait("rcu_torture_writer");
if (stutter_waited &&
!atomic_read(&rcu_fwd_cb_nodelay) &&
!cur_ops->slow_gps &&
!torture_must_stop() &&
boot_ended &&
time_after(jiffies, stallsdone))
for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
if (list_empty(&rcu_tortures[i].rtort_free) &&
@ -2437,7 +2446,8 @@ rcu_torture_reader(void *arg)
torture_hrtimeout_us(500, 1000, &rand);
lastsleep = jiffies + 10;
}
while (torture_num_online_cpus() < mynumonline && !torture_must_stop())
while (!torture_must_stop() &&
(torture_num_online_cpus() < mynumonline || !rcu_inkernel_boot_has_ended()))
schedule_timeout_interruptible(HZ / 5);
stutter_wait("rcu_torture_reader");
} while (!torture_must_stop());
@ -2756,7 +2766,8 @@ rcu_torture_stats_print(void)
cur_ops->stats();
if (rtcv_snap == rcu_torture_current_version &&
rcu_access_pointer(rcu_torture_current) &&
!rcu_stall_is_suppressed()) {
!rcu_stall_is_suppressed() &&
rcu_inkernel_boot_has_ended()) {
int __maybe_unused flags = 0;
unsigned long __maybe_unused gp_seq = 0;
@ -3446,6 +3457,8 @@ static int rcu_torture_fwd_prog(void *args)
int tested_tries = 0;
VERBOSE_TOROUT_STRING("rcu_torture_fwd_progress task started");
while (!rcu_inkernel_boot_has_ended())
schedule_timeout_interruptible(HZ / 10);
rcu_bind_current_to_nocb();
if (!IS_ENABLED(CONFIG_SMP) || !IS_ENABLED(CONFIG_RCU_BOOST))
set_user_nice(current, MAX_NICE);

View File

@ -1021,7 +1021,7 @@ static int main_func(void *arg)
set_user_nice(current, MAX_NICE);
VERBOSE_SCALEOUT("main_func task started");
result_avg = kzalloc(nruns * sizeof(*result_avg), GFP_KERNEL);
result_avg = kcalloc(nruns, sizeof(*result_avg), GFP_KERNEL);
buf = kzalloc(800 + 64, GFP_KERNEL);
if (!result_avg || !buf) {
SCALEOUT_ERRSTRING("out of memory");
@ -1133,9 +1133,9 @@ ref_scale_cleanup(void)
reader_tasks[i].task);
}
kfree(reader_tasks);
reader_tasks = NULL;
torture_stop_kthread("main_task", main_task);
kfree(main_task);
// Do scale-type-specific cleanup operations.
if (cur_ops->cleanup != NULL)

View File

@ -176,10 +176,9 @@ static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
{
unsigned long cookie;
preempt_disable(); // Needed for PREEMPT_LAZY
lockdep_assert_preemption_disabled(); // Needed for PREEMPT_LAZY
cookie = get_state_synchronize_srcu(ssp);
if (ULONG_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie)) {
preempt_enable();
return;
}
WRITE_ONCE(ssp->srcu_idx_max, cookie);
@ -189,7 +188,6 @@ static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
else if (list_empty(&ssp->srcu_work.entry))
list_add(&ssp->srcu_work.entry, &srcu_boot_list);
}
preempt_enable();
}
/*

View File

@ -1168,6 +1168,16 @@ static void srcu_flip(struct srcu_struct *ssp)
* counter update. Note that both this memory barrier and the
* one in srcu_readers_active_idx_check() provide the guarantee
* for __srcu_read_lock().
*
* Note that this is a performance optimization, in which we spend
* an otherwise unnecessary smp_mb() in order to reduce the number
* of full per-CPU-variable scans in srcu_readers_lock_idx() and
* srcu_readers_unlock_idx(). But this performance optimization
* is not so optimal for SRCU-fast, where we would be spending
* not smp_mb(), but rather synchronize_rcu(). At the same time,
* the overhead of the smp_mb() is in the noise, so there is no
* point in omitting it in the SRCU-fast case. So the same code
* is executed either way.
*/
smp_mb(); /* D */ /* Pairs with C. */
}

View File

@ -553,13 +553,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
rtpcp_next = rtp->rtpcp_array[index];
if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
queue_work_on(cpuwq, system_percpu_wq, &rtpcp_next->rtp_work);
index++;
if (index < num_possible_cpus()) {
rtpcp_next = rtp->rtpcp_array[index];
if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND;
queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
queue_work_on(cpuwq, system_percpu_wq, &rtpcp_next->rtp_work);
}
}
}

View File

@ -3800,6 +3800,11 @@ static void rcu_barrier_handler(void *cpu_in)
* to complete. For example, if there are no RCU callbacks queued anywhere
* in the system, then rcu_barrier() is within its rights to return
* immediately, without waiting for anything, much less an RCU grace period.
* In fact, rcu_barrier() will normally not result in any RCU grace periods
* beyond those that were already destined to be executed.
*
* In kernels built with CONFIG_RCU_LAZY=y, this function also hurries all
* pending lazy RCU callbacks.
*/
void rcu_barrier(void)
{
@ -4885,10 +4890,10 @@ void __init rcu_init(void)
rcutree_online_cpu(cpu);
/* Create workqueue for Tree SRCU and for expedited GPs. */
rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0);
rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM | WQ_PERCPU, 0);
WARN_ON(!rcu_gp_wq);
sync_wq = alloc_workqueue("sync_wq", WQ_MEM_RECLAIM, 0);
sync_wq = alloc_workqueue("sync_wq", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
WARN_ON(!sync_wq);
/* Respect if explicitly disabled via a boot parameter. */

View File

@ -626,11 +626,10 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
*/
static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
{
unsigned long flags;
struct rcu_data *rdp;
lockdep_assert_irqs_disabled();
rdp = container_of(iwp, struct rcu_data, defer_qs_iw);
local_irq_save(flags);
/*
* If the IRQ work handler happens to run in the middle of RCU read-side
@ -647,8 +646,6 @@ static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
*/
if (rcu_preempt_depth() > 0)
WRITE_ONCE(rdp->defer_qs_iw_pending, DEFER_QS_IDLE);
local_irq_restore(flags);
}
/*

View File

@ -359,6 +359,8 @@ torture_onoff(void *arg)
torture_hrtimeout_jiffies(onoff_holdoff, &rand);
VERBOSE_TOROUT_STRING("torture_onoff end holdoff");
}
while (!rcu_inkernel_boot_has_ended())
schedule_timeout_interruptible(HZ / 10);
while (!torture_must_stop()) {
if (disable_onoff_at_boot && !rcu_inkernel_boot_has_ended()) {
torture_hrtimeout_jiffies(HZ / 10, &rand);
@ -797,8 +799,9 @@ static unsigned long torture_init_jiffies;
static void
torture_print_module_parms(void)
{
pr_alert("torture module --- %s: disable_onoff_at_boot=%d ftrace_dump_at_shutdown=%d verbose_sleep_frequency=%d verbose_sleep_duration=%d random_shuffle=%d\n",
torture_type, disable_onoff_at_boot, ftrace_dump_at_shutdown, verbose_sleep_frequency, verbose_sleep_duration, random_shuffle);
pr_alert("torture module --- %s: disable_onoff_at_boot=%d ftrace_dump_at_shutdown=%d verbose_sleep_frequency=%d verbose_sleep_duration=%d random_shuffle=%d%s\n",
torture_type, disable_onoff_at_boot, ftrace_dump_at_shutdown, verbose_sleep_frequency, verbose_sleep_duration, random_shuffle,
rcu_inkernel_boot_has_ended() ? "" : " still booting");
}
/*

View File

@ -39,6 +39,22 @@ do
fi
done
# Uses global variables startsecs, startns, endsecs, endns, and limit.
# Exit code is success for time not yet elapsed and failure otherwise.
function timecheck {
local done=`awk -v limit=$limit \
-v startsecs=$startsecs \
-v startns=$startns \
-v endsecs=$endsecs \
-v endns=$endns < /dev/null '
BEGIN {
delta = (endsecs - startsecs) * 1000 * 1000;
delta += int((endns - startns) / 1000);
print delta >= limit;
}'`
return $done
}
while :
do
# Check for done.
@ -85,15 +101,20 @@ do
n=$(($n+1))
sleep .$sleeptime
# Spin a random duration
# Spin a random duration, but with rather coarse granularity.
limit=`awk -v me=$me -v n=$n -v spinmax=$spinmax 'BEGIN {
srand(n + me + systime());
printf("%06d", int(rand() * spinmax));
}' < /dev/null`
n=$(($n+1))
for i in {1..$limit}
startsecs=`date +%s`
startns=`date +%N`
endsecs=$startns
endns=$endns
while timecheck
do
echo > /dev/null
endsecs=`date +%s`
endns=`date +%N`
done
done

View File

@ -94,6 +94,7 @@ usage () {
echo " --do-kvfree / --do-no-kvfree / --no-kvfree"
echo " --do-locktorture / --do-no-locktorture / --no-locktorture"
echo " --do-none"
echo " --do-normal / --do-no-normal / --no-normal"
echo " --do-rcuscale / --do-no-rcuscale / --no-rcuscale"
echo " --do-rcutasksflavors / --do-no-rcutasksflavors / --no-rcutasksflavors"
echo " --do-rcutorture / --do-no-rcutorture / --no-rcutorture"