This is the first half of the driver changes:
- A treewide interface change to the "syscore" operations for
power management, as a preparation for future Tegra specific
changes.
- Reset controller updates with added drivers for LAN969x, eic770
and RZ/G3S SoCs.
- Protection of system controller registers on Renesas and Google SoCs,
to prevent trivially triggering a system crash from e.g. debugfs
access.
- soc_device identification updates on Nvidia, Exynos and Mediatek
- debugfs support in the ST STM32 firewall driver
- Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI
- Cleanups for memory controller support on Nvidia and Renesas
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmky/8gACgkQmmx57+YA
GNlqohAApPTLM6Q4gf1cIcsTVaP0uxx9CBgupCGuT5ORrOMKBghVWjTOTSxeEAab
UQF465QwYUUu602GH34UmRaY9CKW2bMIsfmkgmxNB4Y4Qd7yCgQNJ/h/TnN0rBH+
qTeEsRH/hax4miSNsh0oOZfVkZkg+23VF02d1VL0CcaX7y4oT45RPBQugrNx/gNS
fHfVwgIq8vJ8WyrmM1h2nv1i1vgSzEy50B3kY674BBw83FcJTafNLvD7N5DSgD1H
/I/2xeyEpb+oL1VfeHcXZaX/jf04O+cmvSzBi+MOH1tI3MpdxJib1vEYBdggoOWN
K/FFGgsOY+DNmJPpSnPTTu8UpzksS8SxGBP7M9Q8roKZwA2c9wLotxySvjki5yv8
2zvabRdzbrSaoYwsH9QnZdQ2hVkJ9W8MESu8PevD3yMNuFUzledPDWW0N1SbGm78
0ZdB6NPdaBZYHMNMRdFhN8P275/Mx5e0XWN9oYMQqjPooH7YkyT7hJWz6ao2PCJP
8mDmnW1RzL+LWf7mJ25ZEtS+YjmKA/PVmogRrGurKCadvdxXqCF09KNljICHhmmu
t0KB4dqw02OXLPvBk21qCi0zL56w1JDgqtS8suFvDYo9sCceeAbAcmpyoUOFj2N+
Upn976tb4iqFrr9mFswpmCJWPpqJkU+A+KnKsIRPU7N4kSrP35I=
=HvlN
-----END PGP SIGNATURE-----
Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"This is the first half of the driver changes:
- A treewide interface change to the "syscore" operations for power
management, as a preparation for future Tegra specific changes
- Reset controller updates with added drivers for LAN969x, eic770 and
RZ/G3S SoCs
- Protection of system controller registers on Renesas and Google
SoCs, to prevent trivially triggering a system crash from e.g.
debugfs access
- soc_device identification updates on Nvidia, Exynos and Mediatek
- debugfs support in the ST STM32 firewall driver
- Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI
- Cleanups for memory controller support on Nvidia and Renesas"
* tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits)
memory: tegra186-emc: Fix missing put_bpmp
Documentation: reset: Remove reset_controller_add_lookup()
reset: fix BIT macro reference
reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe
reset: th1520: Support reset controllers in more subsystems
reset: th1520: Prepare for supporting multiple controllers
dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys
dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets
reset: remove legacy reset lookup code
clk: davinci: psc: drop unused reset lookup
reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC
reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY
dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support
reset: eswin: Add eic7700 reset driver
dt-bindings: reset: eswin: Documentation for eic7700 SoC
reset: sparx5: add LAN969x support
dt-bindings: reset: microchip: Add LAN969x support
soc: rockchip: grf: Add select correct PWM implementation on RK3368
soc/tegra: pmc: Add USB wake events for Tegra234
amba: tegra-ahb: Fix device leak on SMMU enable
...
The macro for_each_console_srcu iterates over all registered consoles. It's
implied that all registered consoles have CON_ENABLED flag set, making
the check for the flag unnecessary. Call console_is_usable function to
fully verify if the given console is usable before calling the ->unblank
callback.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-3-57b8b78647f4@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
With commit ("printk: Avoid scheduling irq_work on suspend") the
implementation of printk_get_console_flush_type() was modified to
avoid offloading when irq_work should be blocked during suspend.
Since printk uses the returned flush type to determine what
flushing methods are used, this was thought to be sufficient for
avoiding irq_work usage during the suspend phase.
However, vprintk_emit() implements a hack to support
printk_deferred(). In this hack, the returned flush type is
adjusted to make sure no legacy direct printing occurs when
printk_deferred() was used.
Because of this hack, the legacy offloading flushing method can
still be used, causing irq_work to be queued when it should not
be.
Adjust the vprintk_emit() hack to also consider
@console_irqwork_blocked so that legacy offloading will not be
chosen when irq_work should be blocked.
Link: https://lore.kernel.org/lkml/87fra90xv4.fsf@jogness.linutronix.de
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Fixes: 26873e3e7f0c ("printk: Avoid scheduling irq_work on suspend")
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Allowing irq_work to be scheduled while trying to suspend has shown
to cause problems as some architectures interpret the pending
interrupts as a reason to not suspend. This became a problem for
printk() with the introduction of NBCON consoles. With every
printk() call, NBCON console printing kthreads are woken by queueing
irq_work. This means that irq_work continues to be queued due to
printk() calls late in the suspend procedure.
Avoid this problem by preventing printk() from queueing irq_work
once console suspending has begun. This applies to triggering NBCON
and legacy deferred printing as well as klogd waiters.
Since triggering of NBCON threaded printing relies on irq_work, the
pr_flush() within console_suspend_all() is used to perform the final
flushing before suspending consoles and blocking irq_work queueing.
NBCON consoles that are not suspended (due to the usage of the
"no_console_suspend" boot argument) transition to atomic flushing.
Introduce a new global variable @console_irqwork_blocked to flag
when irq_work queueing is to be avoided. The flag is used by
printk_get_console_flush_type() to avoid allowing deferred printing
and switch NBCON consoles to atomic flushing. It is also used by
vprintk_emit() to avoid klogd waking.
Add WARN_ON_ONCE(console_irqwork_blocked) to the irq_work queuing
functions to catch any code that attempts to queue printk irq_work
during the suspending/resuming procedure.
Cc: stable@vger.kernel.org # 6.13.x because no drivers in 6.12.x
Fixes: 6b93bb41f6ea ("printk: Add non-BKL (nbcon) console basic infrastructure")
Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://patch.msgid.link/20251113160351.113031-3-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Currently printk_trigger_flush() only triggers legacy offloaded
flushing, even if that may not be the appropriate method to flush
for currently registered consoles. (The function predates the
NBCON consoles.)
Since commit 6690d6b52726 ("printk: Add helper for flush type
logic") there is printk_get_console_flush_type(), which also
considers NBCON consoles and reports all the methods of flushing
appropriate based on the system state and consoles available.
Update printk_trigger_flush() to use
printk_get_console_flush_type() to appropriately flush registered
consoles.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/stable/20251113160351.113031-2-john.ogness%40linutronix.de
Tested-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://patch.msgid.link/20251113160351.113031-2-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Several drivers can benefit from registering per-instance data along
with the syscore operations. To achieve this, move the modifiable fields
out of the syscore_ops structure and into a separate struct syscore that
can be registered with the framework. Add a void * driver data field for
drivers to store contextual data that will be passed to the syscore ops.
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The decision whether some more space is needed is tricky in the printk
ring buffer code:
1. The given lpos values might overflow. A subtraction must be used
instead of a simple "lower than" check.
2. Another CPU might reuse the space in the mean time. It can be
detected when the subtraction is bigger than DATA_SIZE(data_ring).
3. There is exactly enough space when the result of the subtraction
is zero. But more space is needed when the result is exactly
DATA_SIZE(data_ring).
Add a helper function to make sure that the check is done correctly
in all situations. Also it helps to make the code consistent and
better documented.
Suggested-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/87tsz7iea2.fsf@jogness.linutronix.de
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251107194720.1231457-3-pmladek@suse.com
[pmladek@suse.com: Updated wording as suggested by John]
Signed-off-by: Petr Mladek <pmladek@suse.com>
There may be console drivers that have not yet figured out a way
to implement safe atomic printing (->write_atomic() callback).
These drivers could choose to only implement threaded printing
(->write_thread() callback), but then it is guaranteed that _no_
output will be printed during panic. Not even attempted.
As a result, developers may be tempted to implement unsafe
->write_atomic() callbacks and/or implement some sort of custom
deferred printing trickery to try to make it work. This goes
against the principle intention of the nbcon API as well as
endangers other nbcon drivers that are doing things correctly
(safely).
As a compromise, allow nbcon drivers to implement unsafe
->write_atomic() callbacks by providing a new console flag
CON_NBCON_ATOMIC_UNSAFE. When specified, the ->write_atomic()
callback for that console will _only_ be called during the
final "hope and pray" flush attempt at the end of a panic:
nbcon_atomic_flush_unsafe().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/all/swdpckuwwlv3uiessmtnf2jwlx3jusw6u7fpk5iggqo4t2vdws@7rpjso4gr7qp/ [1]
Link: https://lore.kernel.org/all/20251103-fix_netpoll_aa-v4-1-4cfecdf6da7c@debian.org/ [2]
Link: https://patch.msgid.link/20251027161212.334219-2-john.ogness@linutronix.de
[pmladek@suse.com: Fix build with rework/nbcon-in-kdb branch.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
printk() tries to flush messages with NBCON_PRIO_EMERGENCY on
nbcon consoles immediately. It might take seconds to flush all
pending lines on slow serial consoles. Note that there might be
hundreds of messages, for example:
[ 3.771531][ T1] pci 0000:3e:08.1: [8086:324
** replaying previous printk message **
[ 3.771531][ T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
[ ... more than 2000 lines, about 200kB messages ... ]
[ 3.837752][ T1] pci 0000:20:01.0: Adding to iommu group 18
[ 3.837851][ T
** replaying previous printk message **
[ 3.837851][ T1] pci 0000:20:03.0: Adding to iommu group 19
[ 3.837946][ T1] pci 0000:20:05.0: Adding to iommu group 20
[ ... more than 500 messages for iommu groups 21-590 ...]
[ 3.912932][ T1] pci 0000:f6:00.1: Adding to iommu group 591
[ 3.913070][ T1] pci 0000:f6:00.2: Adding to iommu group 592
[ 3.913243][ T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 3.913245][ T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 3.913245][ T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
[ 3.913324][ T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[ 3.913325][ T1] RAPL PMU: hw unit of domain package 2^-14 Joules
[ 3.913326][ T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
[ 3.913327][ T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
[ 3.933486][ T1] ------------[ cut here ]------------
[ 3.933488][ T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
[ 3.930291][ C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
[ 3.930291][ C0] Kernel panic - not syncing: Hard LOCKUP
[...]
[ 3.930291][ C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted...
[...]
[ 3.930291][ C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
[ 3.930291][ C0] Call Trace:
[...]
[ 3.930291][ C0] <TASK>
[ 3.930291][ C0] serial8250_console_write+0x16d/0x5c0
[ 3.930291][ C0] nbcon_emit_next_record+0x22c/0x250
[ 3.930291][ C0] nbcon_emit_one+0x93/0xe0
[ 3.930291][ C0] nbcon_kthread_func+0x13c/0x1c0
The are visible two takeovers of the console ownership:
- The 1st one is triggered by the "WARNING: CPU: 2 PID: 1 at
arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
- The 2nd one is triggered by the "Kernel panic - not syncing:
Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
There are more than 2500 lines, at about 240kB, emitted between
the takeover and the 1st "WARNING" line in the emergency context.
This amount of pending messages had to be flushed by
nbcon_atomic_flush_pending() when WARN() printed its first line.
The atomic flush was holding the nbcon console context for too long so
that it triggered hard lockup on the CPU running the printk kthread
"pr/ttyS0". The kthread needed to reacquire the console ownership
for restoring the original serial port state in serial8250_console_write().
Prevent the hardlockup by releasing the nbcon console ownership after
each emitted record.
Note that __nbcon_atomic_flush_pending_con() used to hold the console
ownership all the time because it blocked the printk kthread. Otherwise
the kthread tried to flush the messages in parallel which caused repeated
takeovers and more replayed messages.
It is not longer a problem because the repeated takeovers are blocked
by the counter of emergency contexts, see nbcon_cpu_emergency_cnt.
Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-4-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
The printk kthread might be running when there is a panic in progress.
But it is not able to acquire the console ownership any longer.
Prevent the desperate attempts to acquire the ownership and allow sleeping
in panic. It would make it behave the same as when there is any CPU
in an emergency context.
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-3-pmladek@suse.com
[pmladek@suse.com: Rebased on top of 6.18-rc1 (panic_in_progress() moved to panic.c)]
Signed-off-by: Petr Mladek <pmladek@suse.com>
In emergency contexts, printk() tries to flush messages directly even
on nbcon consoles. And it is allowed to takeover the console ownership
and interrupt the printk kthread in the middle of a message.
Only one takeover and one repeated message should be enough in most
situations. The first emergency message flushes the backlog and printk
kthreads get to sleep. Next emergency messages are flushed directly
and printk() does not wake up the kthreads.
However, the one takeover is not guaranteed. Any printk() in normal
context on another CPU could wake up the kthreads. Or a new emergency
message might be added before the kthreads get to sleep. Note that
the interrupted .write_thread() callbacks usually have to call
nbcon_reacquire_nobuf() and restore the original device setting
before checking for pending messages.
The risk of the repeated takeovers will be even bigger because
__nbcon_atomic_flush_pending_con is going to release the console
ownership after each emitted record. It will be needed to prevent
hardlockup reports on other CPUs which are busy waiting for
the context ownership, for example, by nbcon_reacquire_nobuf() or
__uart_port_nbcon_acquire().
The repeated takeovers break the output, for example:
[ 5042.650211][ T2220] Call Trace:
[ 5042.6511
** replaying previous printk message **
[ 5042.651192][ T2220] <TASK>
[ 5042.652160][ T2220] kunit_run_
** replaying previous printk message **
[ 5042.652160][ T2220] kunit_run_tests+0x72/0x90
[ 5042.653340][ T22
** replaying previous printk message **
[ 5042.653340][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5042.654628][ T2220] ? stack_trace_save+0x4d/0x70
[ 5042.6553
** replaying previous printk message **
[ 5042.655394][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5042.656713][ T2220] ? save_trace+0x5b/0x180
A more robust solution is to block the printk kthread entirely whenever
*any* CPU enters an emergency context. This ensures that critical messages
can be flushed without contention from the normal, non-atomic printing
path.
Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-2-pmladek@suse.com
[pmladek@suse.com: Added changes proposed by John Ogness]
Signed-off-by: Petr Mladek <pmladek@suse.com>
printk_legacy_map is used to hide lock nesting violations caused by
legacy drivers and is using the wrong override type. LD_WAIT_SLEEP is
for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for
lock type which are sleeping while spinning on PREEMPT_RT such as
spinlock_t.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251026150726.GA23223@redhat.com
[pmladek@suse.com: Fixed indentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
This function will be used in the next patch to allow a driver to set
both the message and message length of a nbcon_write_context. This is
necessary because the function also initializes the ->unsafe_takeover
struct member. By using this helper we ensure that the struct is
initialized correctly.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-4-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
KDB can interrupt any console to execute the "mirrored printing" at any
time, so add an exception to nbcon_context_try_acquire_direct to allow
to get the context if the current CPU is the same as kdb_printf_cpu.
This change will be necessary for the next patch, which fixes
kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
such consoles. But to print it first needs to acquire the ownership of
the console, so nbcon_context_try_acquire_direct is fixed here.
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-3-866aac60a80e@suse.com
[pmladek@suse.com: Fix compilation with !CONFIG_KGDB_KDB.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementation
as nbcon_device_try_acquire, but using NBCON_PRIO_EMERGENCY when
acquiring the context.
If the acquire succeeds, the message and message length are assigned to
nbcon_write_context so ->write_atomic can print the message.
After release try to flush the console since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-2-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
The helper will be used on KDB code in the next commits.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-1-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
The legacy printer kthread uses console_lock and
__console_flush_and_unlock to flush records to the console. This
approach results in the console_lock being held for the entire
duration of a flush. This can result in large waiting times for
those waiting for console_lock especially where there is a large
volume of records or where the console is slow (e.g. serial). This
contention is observed during boot, as the call to filp_open in
console_on_rootfs will delay progression to userspace until any
in-flight flush is completed.
Let's instead use console_flush_one_record and release/reacquire
the console_lock between records.
On a PocketBeagle 2, with the following boot args:
"console=ttyS2,9600 initcall_debug=1 loglevel=10"
Without this patch:
[ 5.613166] +console_on_rootfs/filp_open
[ 5.643473] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[ 5.643823] probe of fa00000.mmc returned 0 after 258244 usecs
[ 5.710520] mmc1: new UHS-I speed SDR104 SDHC card at address 5048
[ 5.721976] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[ 5.747258] mmcblk1: p1 p2
[ 5.753324] probe of mmc1:5048 returned 0 after 40002 usecs
[ 15.595240] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss
[ 15.595282] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog
[ 15.595297] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog
[ 15.595437] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc
[ 146.275961] -console_on_rootfs/filp_open ...
and with:
[ 5.477122] +console_on_rootfs/filp_open
[ 5.595814] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[ 5.596181] probe of fa00000.mmc returned 0 after 312757 usecs
[ 5.662813] mmc1: new UHS-I speed SDR104 SDHC card at address 5048
[ 5.674367] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[ 5.699320] mmcblk1: p1 p2
[ 5.705494] probe of mmc1:5048 returned 0 after 39987 usecs
[ 6.418682] -console_on_rootfs/filp_open ...
...
[ 15.593509] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss
[ 15.593551] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog
[ 15.593566] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog
[ 15.593704] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc
Where I've added a printk surrounding the call in console_on_rootfs
to filp_open.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-3-00f1f0ac055a@thegoodpenguin.co.uk
[pmladek@suse.com: Fixed ordering of variable definition suggested by John.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
console_flush_one_record() and console_flush_all() duplicate several
checks. They both want to tell the caller that consoles are not
longer usable in this context because it has lost the lock or
the lock has to be reserved for the panic CPU.
Remove the duplication by changing the semantic of the function
console_flush_one_record() return value and parameters.
The function will return true when it is able to do the job. It means
that there is at least one usable console. And the flushing was
not interrupted by a takeover or panic_on_other_cpu().
Also replace the @any_usable parameter with @try_again. The @try_again
parameter will be set to true when the function could do the job
and at least one console made a progress.
Motivation:
The callers need to know when
+ they should continue flushing => @try_again
+ when the console is flushed => can_do_the_job(return) && !@try_again
+ when @next_seq is valid => same as flushed
+ when lost console_lock => @takeover
The proposed change makes it clear when the function can do
the job. It simplifies the answer for the other questions.
Also the return value from console_flush_one_record() can
be used as return value from console_flush_all().
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-2-00f1f0ac055a@thegoodpenguin.co.uk
[pmladek@suse.com: Fixed type of any_usable variable reported by John]
Signed-off-by: Petr Mladek <pmladek@suse.com>
console_flush_all prints all remaining records to all usable consoles
whilst its caller holds console_lock. This can result in large waiting
times for those waiting for console_lock especially where there is a
large volume of records or where the console is slow (e.g. serial).
Let's extract the parts of this function which print a single record
into a new function named console_flush_one_record. This can later
be used for functions that will release and reacquire console_lock
between records.
This commit should not change existing functionality.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-1-00f1f0ac055a@thegoodpenguin.co.uk
Signed-off-by: Petr Mladek <pmladek@suse.com>
Previously, data blocks that perfectly fit the data ring buffer would
get wrapped around to the beginning for no reason since the calculated
offset of the next data block would belong to the next wrap. Since this
offset is not actually part of the data block, but rather the offset of
where the next data block is going to start, there is no reason to
include it when deciding whether the current block fits the buffer.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250905144152.9137-2-d-tatianin@yandex-team.ru
[pmladek@suse.com: Updated indentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM
xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW
e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy
7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD
CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo
JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC
4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4
AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq
iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde
IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE
m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ
NUgzh2cOwSWWU4UHt/AiD9cd
=uFn0
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add KUnit test for the printk ring buffer
- Fix the check of the maximal record size which is allowed to be
stored into the printk ring buffer. It prevents corruptions of the
ring buffer.
Note that printk() is on the safe side. The messages are limited by
1kB buffer and are always small enough for the minimal log buffer
size 4kB, see CONFIG_LOG_BUF_SHIFT definition.
* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: ringbuffer: Fix data block max size check
printk: kunit: support offstack cpumask
printk: kunit: Fix __counted_by() in struct prbtest_rbdata
printk: ringbuffer: Explain why the KUnit test ignores failed writes
printk: ringbuffer: Add KUnit test
Currently data_check_size() limits data blocks to a maximum size of
the full buffer minus an ID (long integer):
max_size <= DATA_SIZE(data_ring) - sizeof(long)
However, this is not an appropriate limit due to the nature of
wrapping data blocks. For example, if a data block is larger than
half the buffer:
size = (DATA_SIZE(data_ring) / 2) + 8
and begins exactly in the middle of the buffer, then:
- the data block will wrap
- the ID will be stored at exactly half of the buffer
- the record data begins at the beginning of the buffer
- the record data ends 8 bytes _past_ exactly half of the buffer
The record overwrites itself, i.e. needs more space than the full
buffer!
Luckily printk() is not vulnerable to this problem because
truncate_msg() limits printk-messages to 1/4 of the ringbuffer.
Indeed, by adjusting the printk_ringbuffer KUnit test, which does not
use printk() and its truncate_msg() check, it is easy to see that the
ringbuffer becomes corrupted for records larger than half the buffer
size.
The corruption occurs because data_push_tail() expects it will never
be requested to push the tail beyond the head.
Avoid this problem by adjusting data_check_size() to limit record
sizes to half the buffer size. Also add WARN_ON_ONCE() before
relevant data_push_tail() calls to validate that there are no such
illegal requests. WARN_ON_ONCE() is used, rather than just adding
extra checks to data_push_tail() because it is considered a bug to
attempt such illegal actions.
Link: https://lore.kernel.org/lkml/aMLrGCQSyC8odlFZ@pathway.suse.cz
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Replace quoted includes of panic.h with `#include <linux/panic.h>` for
consistency across the kernel.
Link: https://lkml.kernel.org/r/20250829051312.33773-1-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Qianqiang Liu <qianqiang.liu@163.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The helper this_cpu_in_panic() duplicated logic already provided by
panic_on_this_cpu().
Remove this_cpu_in_panic() and switch all users to panic_on_this_cpu().
This simplifies the code and avoids having two helpers for the same check.
Link: https://lkml.kernel.org/r/20250825022947.1596226-8-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kees Cook <kees@kernel.org>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Nam Cao <namcao@linutronix.de>
Cc: oushixiong <oushixiong@kylinos.cn>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qianqiang Liu <qianqiang.liu@163.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Zimemrmann <tzimmermann@suse.de>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nbcon_context_try_acquire() compared panic_cpu directly with
smp_processor_id(). This open-coded check is now provided by
panic_on_this_cpu().
Switch to panic_on_this_cpu() to simplify the code and improve readability.
Link: https://lkml.kernel.org/r/20250825022947.1596226-7-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kees Cook <kees@kernel.org>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Nam Cao <namcao@linutronix.de>
Cc: oushixiong <oushixiong@kylinos.cn>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qianqiang Liu <qianqiang.liu@163.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Zimemrmann <tzimmermann@suse.de>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "panic: introduce panic status function family", v2.
This series introduces a family of helper functions to manage panic state
and updates existing code to use them.
Before this series, panic state helpers were scattered and inconsistent.
For example, panic_in_progress() was defined in printk/printk.c, not in
panic.c or panic.h. As a result, developers had to look in unexpected
places to understand or re-use panic state logic. Other checks were open-
coded, duplicating logic across panic, crash, and watchdog paths.
The new helpers centralize the functionality in panic.c/panic.h:
- panic_try_start()
- panic_reset()
- panic_in_progress()
- panic_on_this_cpu()
- panic_on_other_cpu()
Patches 1–8 add the helpers and convert panic/crash and printk/nbcon
code to use them.
Patch 9 fixes a bug in the watchdog subsystem by skipping checks when a
panic is in progress, avoiding interference with the panic CPU.
Together, this makes panic state handling simpler, more discoverable, and
more robust.
This patch (of 9):
This patch introduces four new helper functions to abstract the management
of the panic_cpu variable. These functions will be used in subsequent
patches to refactor existing code.
The direct use of panic_cpu can be error-prone and ambiguous, as it
requires manual checks to determine which CPU is handling the panic. The
new helpers clarify intent:
panic_try_start():
Atomically sets the current CPU as the panicking CPU.
panic_reset():
Reset panic_cpu to PANIC_CPU_INVALID.
panic_in_progress():
Checks if a panic has been triggered.
panic_on_this_cpu():
Returns true if the current CPU is the panic originator.
panic_on_other_cpu():
Returns true if a panic is on another CPU.
This change lays the groundwork for improved code readability
and robustness in the panic handling subsystem.
Link: https://lkml.kernel.org/r/20250825022947.1596226-1-wangjinchao600@gmail.com
Link: https://lkml.kernel.org/r/20250825022947.1596226-2-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kees Cook <kees@kernel.org>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Nam Cao <namcao@linutronix.de>
Cc: oushixiong <oushixiong@kylinos.cn>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qianqiang Liu <qianqiang.liu@163.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Zimemrmann <tzimmermann@suse.de>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>b
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For large values of CONFIG_NR_CPUS, the newly added kunit test fails
to build:
kernel/printk/printk_ringbuffer_kunit_test.c: In function 'test_readerwriter':
kernel/printk/printk_ringbuffer_kunit_test.c:279:1: error: the frame size of 1432 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Change this to use cpumask_var_t and allocate it dynamically when
CONFIG_CPUMASK_OFFSTACK is set.
The variable has to be released via a KUnit action wrapper so that it is
freed when the test fails and gets aborted. The parameter type is hardcoded
to "struct cpumask *" because the macro KUNIT_DEFINE_ACTION_WRAPPER()
does not accept an array. But the function does nothing when
CONFIG_CPUMASK_OFFSTACK is not set anyway.
Fixes: 5ea2bcdfbf46 ("printk: ringbuffer: Add KUnit test")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20250620192554.2234184-1-arnd@kernel.org # v1
[pmladek@suse.com: Correctly handle allocation failures and freeing using KUnit test API.]
Link: https://lore.kernel.org/all/20250702095157.110916-3-pmladek@suse.com # v2
Signed-off-by: Petr Mladek <pmladek@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmiQpykACgkQUqAMR0iA
lPJcrg/9Hez6+zO7LECCn5VkuK5oJWR5CyCfwx14ki8UF38djQGU2frckI5837rE
MnVoEBexZunK5SXy4MAy7bTCitzR+lMqNtP5uq9J2ovlSPtNlfuJRDr7uGQLDtSS
M5KZ1qsZnhgwLYeNhfVVToHgp+OwIQb2GcgYmYc8k03fUI1NQpdxIM46DzoTj+06
x6qgrNsmmJbm8E73VWBByJAEFoq9ugjny8Rt+tYMi/CmhgZpp0ZyF1r5dYfYX/KS
VS8UQY//aZOFhNsQUAXwP7Ym00CYRgTg7Na+MHivYLXmYGH2gF6tWQhX/eEgHKcJ
RTmUbLFx70fdBbjJMxv2k8vyMk2sy6sTfJHPqM/NS/Fb0tSPBXQJG/EexzfoqiBc
wcjgOPkeALIosVdFdTqXxjoIGOP8rqsU4t6Y6WFjJlWK04SBVjxBUofytRdQSxkG
5Sb0rFVGKrKIkXaVkt4byPa1/BDpfNhfKMYPtQ56pv2VNUgzfye4prUAZHE5pLnK
8nixeeMtKDFFCBpn6rG5wZW7k2mK5FrWGZUfdfxdK1gWQ1y0kqGy5wa3lNZLcxlH
l3AtOYoDeWM2DjDVO6WCj8ambEWkbjbGg7tC9TI3F0NvRJSYytTb6npMqb3Gwhcb
U4NgT+Ho0GJ/5BLUye8HMfhvrGoCfRCeptHtEFXAK7pzKyjc0+c=
=Mocd
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add new "hash_pointers=[auto|always|never]" boot parameter to force
the hashing even with "slab_debug" enabled
- Allow to stop CPU, after losing nbcon console ownership during
panic(), even without proper NMI
- Allow to use the printk kthread immediately even for the 1st
registered nbcon
- Compiler warning removal
* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: nbcon: Allow reacquire during panic
printk: Allow to use the printk kthread immediately even for 1st nbcon
slab: Decouple slab_debug and no_hash_pointers
vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
compiler-gcc.h: Introduce __diag_GCC_all
__counted_by() has to point to a variable which defines the size
of the related array. The code must never access the array
beyond this limit.
struct prbtest_rbdata currently stores the length of the string.
And the code access the array beyond the limit when writing
or reading the trailing '\0'.
Store the size of the string, including the trailing '\0' if
we wanted to keep __counted_by().
Consistently use "_size" suffix when the trailing '\0' is counted.
Note that MAX_RBDATA_TEXT_SIZE was originally used to limit
the text length.
When touching the code, make sure that @text_size produced by
get_random_u32_inclusive() stays within the limits.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/eaea66b9-266a-46e7-980d-33f40ad4b215@sabinyo.mountain
Suggested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250702095157.110916-4-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
The KUnit test ignores prb_reserve() failures on purpose. It tries
to push the ringbuffer beyond limits.
Note that it is a know problem that writes might fail in this situation.
printk() tries to prevent this problem by:
+ allocating big enough data buffer, see log_buf_add_cpu().
+ allocating enough descriptors by using small enough average
record, see PRB_AVGBITS.
+ storing the record with disabled interrupts, see vprintk_store().
Also the amount of printk() messages is always somehow bound in
practice. And they are serialized when they are printed from
many CPUs on purpose, for example, when printing backtraces.
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250702095157.110916-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
vprintk_deferred() is useful for implementing runtime verification
reactors. Make it public.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The KUnit test validates the correct operation of the ringbuffer.
A separate dedicated ringbuffer is used so that the global printk
ringbuffer is not touched.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20250612-printk-ringbuffer-test-v3-1-550c088ee368@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
If a console printer is interrupted during panic, it will never
be able to reacquire ownership in order to perform and cleanup.
That in itself is not a problem, since the non-panic CPU will
simply quiesce in an endless loop within nbcon_reacquire_nobuf().
However, in this state, platforms that do not support a true NMI
to interrupt the quiesced CPU will not be able to shutdown that
CPU from within panic(). This then causes problems for such as
being unable to load and run a kdump kernel.
Fix this by allowing non-panic CPUs to reacquire ownership using
a direct acquire. Then the non-panic CPUs can successfullyl exit
the nbcon_reacquire_nobuf() loop and the console driver can
perform any necessary cleanup. But more importantly, the CPU is
no longer quiesced and is free to process any interrupts
necessary for panic() to shutdown the CPU.
All other forms of acquire are still not allowed for non-panic
CPUs since it is safer to have them avoid gaining console
ownership that is not strictly necessary.
Reported-by: Michael Kelley <mhklinux@outlook.com>
Closes: https://lore.kernel.org/r/SN6PR02MB4157A4C5E8CB219A75263A17D46DA@SN6PR02MB4157.namprd02.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250606185549.900611-1-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
The kthreads for nbcon consoles are created by nbcon_alloc() at
the beginning of the console registration. But it currently works
only for the 2nd or later nbcon console because the code checks
@printk_kthreads_running.
The kthread for the 1st registered nbcon console is created at the very
end of register_console() by printk_kthreads_check_locked(). As a result,
the entire log is replayed synchronously when the "enabled" message
gets printed. It might block the boot for a long time with a slow serial
console.
Prevent the synchronous flush by creating the kthread even for the 1st
nbcon console when it is safe (kthreads ready and no boot consoles).
Also inform printk() to use the kthread by setting
@printk_kthreads_running. Note that the kthreads already must be
running when it is safe and this is not the 1st nbcon console.
Symmetrically, clear @printk_kthreads_running when the last nbcon
console was unregistered by nbcon_free(). This requires updating
@have_nbcon_console before nbcon_free() gets called.
Note that there is _no_ problem when the 1st nbcon console replaces boot
consoles. In this case, the kthread will be started at the end
of registration after the boot consoles are removed. But the console
does not reply the entire log buffer in this case. Note that
the flag CON_PRINTBUFFER is always cleared when the boot consoles are
removed and vice versa.
Closes: https://lore.kernel.org/r/20250514173514.2117832-1-mcobb@thegoodpenguin.co.uk
Tested-by: Michael Cobb <mcobb@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflJC4ACgkQUqAMR0iA
lPIEMBAAsIJmQPHdZUQ49tkMGMoQPU9gnrrDZbRzN9IWwMQQymwEobSEkJkrx7O3
MFQHc5Z29cAxDqrWDCG5hrH0P3rd994pLPgC/XZC9+UbksYIkIrudCqaPaekLVaw
BQmBZnZRe7PltOiDpx/TxQ8WQgNlAmZpLpIRTB3ePHZpikq/mf4CGcoYXkrNij1j
E7uzurLZssJwXasRkWOVNvaOFyVcxrjn78H0JoT7mVeh2IUkWyMALX4IyDeD6Uvh
SCe19/6W3Uyeh/jpJ6+gjuirVy/Vam/2GXeer4yKpgSQBtsqIaGyki4Cc6Xy12gU
71nRPTggysZXNnqZ40qw68Irb9JGAo63qFDwWf3OQI+yQQWHyGZIinxr+2xupwWF
ZgmDJtvUsWKHT4+WmB9+MVTGW7Jy8wKq2u/Haf8xCJxs2yhI0J8iUvQi3CPAMdRH
Q8G9UJP29Lfj0hdKVT2/wVNGxZKIZ8c6v1vR6Rd1hJ0Zqp4Oj+XGr0btE8Ah5LYv
GI/M00LCdTNT2kBbbw8lekovUIs2stiRl6hfKoQFR11oiZwFDUePLDdJH2GiJAZm
g9fPs6YuBERfPtzUeo9a9UbSd6oPTLgLNBEklpHUTWilzO8E/Jh+ZXyvQwj7eXyq
Hi4OnShn1uxL68o6DJfKsF2qW8S2U5g61JrU3D1CaQ98Derrp0o=
=lNTZ
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- New option "printk.debug_non_panic_cpus" allows to store printk
messages from non-panic CPUs during panic. It might be useful when
panic() fails. It is disabled by default because it increases the
chance to see the messages printed before panic() and on the
panic-CPU.
- New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build
kernel without the virtual terminal support which prefers ttynull
over serial console.
- Do not unblank suspended consoles.
- Some code clean up.
* tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
printk: Add an option to allow ttynull to be a default console device
printk: Check CON_SUSPEND when unblanking a console
printk: Rename console_start to console_resume
printk: Rename console_stop to console_suspend
printk: Rename resume_console to console_resume_all
printk: Rename suspend_console to console_suspend_all
Commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
aimed to isolate panic-related messages. However, when panic() itself
malfunctions, messages from non-panic CPUs become crucial for debugging.
While commit bcc954c6caba ("printk/panic: Allow cpu backtraces to
be written into ringbuffer during panic") enables non-panic CPU
backtraces, it may not provide sufficient diagnostic information.
Introduce the "debug_non_panic_cpus" command-line option, enabling
non-panic CPU messages to be stored in the ring buffer during a panic.
This also prevents discarding non-finalized messages from non-panic CPUs
during console flushing, providing a more comprehensive view of system
state during critical failures.
Link: https://lore.kernel.org/all/Z8cLEkqLL2IOyNIj@pathway/
Signed-off-by: Donghyeok Choe <d7271.choe@samsung.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250318022320.2428155-1-d7271.choe@samsung.com
[pmladek@suse.com: Added documentation, added module_parameter, removed printk_ prefix.]
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The new option is CONFIG_NULL_TTY_DEFAULT_CONSOLE.
if enabled, and CONFIG_VT is disabled, ttynull will become the default
primary console device.
ttynull will be the only console device usually with this option enabled.
Some architectures do call add_preferred_console() which may add another
console though.
Motivation:
Many distributions ship with CONFIG_VT enabled. On tested desktop hardware
if CONFIG_VT is disabled, the default console device falls back to
/dev/ttyS0 instead of /dev/tty.
This could cause issues in user space, and hardware problems:
1. The user space issues include the case where /dev/ttyS0 is
disconnected, and the TCGETS ioctl, which some user space libraries use
as a probe to determine if a file is a tty, is called on /dev/console and
fails. Programs that call isatty() on /dev/console and get an incorrect
false value may skip expected logging to /dev/console.
2. The hardware issues include the case if a user has a science instrument
or other device connected to the /dev/ttyS0 port, and they were to upgrade
to a kernel that is disabling the CONFIG_VT option, kernel logs will then
be sent to the device connected to /dev/ttyS0 unless they edit their
kernel command line manually.
The new CONFIG_NULL_TTY_DEFAULT_CONSOLE option will give users and
distribution maintainers an option to avoid this. Disabling CONFIG_VT and
enabling CONFIG_NULL_TTY_DEFAULT_CONSOLE will ensure the default kernel
console behavior is not dependent on hardware configuration by default, and
avoid unexpected new behavior on devices connected to the /dev/ttyS0 serial
port.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Adam Simonelli <adamsimonelli@gmail.com>
Link: https://lore.kernel.org/r/20250314160749.3286153-2-adamsimonelli@gmail.com
[pmladek@suse.com: Fixed indentation of the commit message.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
The commit 9e70a5e109a4 ("printk: Add per-console suspended state")
introduced the CON_SUSPENDED flag for consoles. The suspended consoles
will stop receiving messages, so don't unblank suspended consoles
because it won't be showing anything either way.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-5-0b878577f2e6@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
The intent of console_start was to resume a previously suspended console,
so rename it accordingly.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-4-0b878577f2e6@suse.com
[pmladek@suse.com: Fixed typo in the commit message. Updated also new drm_log.c.]
Signed-off-by: Petr Mladek <pmladek@suse.com>