#8566 HIGH 9.1.0: Improved crash and hang diagnosis tools.
Zarro Boogs per Child
bugtracker at laptop.org
Mon Sep 22 12:33:54 EDT 2008
#8566: Improved crash and hang diagnosis tools.
----------------------------+-----------------------------------------------
Reporter: mstone | Owner: JordanCrouse
Type: defect | Status: new
Priority: high | Milestone: 9.1.0
Component: not assigned | Version: not specified
Resolution: | Keywords:
Next_action: design | Verified: 0
Blockedby: | Blocking:
----------------------------+-----------------------------------------------
Comment(by JordanCrouse):
Here are a few options off the top of my head:
1. Use a reserved chunk of memory that both the firmware and the kernel
understand and respect. Kernel log messages will go into that chunk of
memory in a circular buffer fashion. When the system crashes or hangs, a
watchdog timer can reboot the system, at which time the user can use th
firmware to read the message log. The log should survive a brief reboot,
assuming that the firmware respects the chunk of memory and doesn't clear
it during RAM init. PROS - fairly easy to implement, all we need is
communication between OFW and the kernel and the bits in the kernel to
write to the log buffer. CONS - The logs will not survive anything more
then a brief power outage.
2. Second verse, same as the first except this time the logs will be
written to the flash before rebooting. This can be accomplished by using
a software watchdog and kernel oops mechanism to write the log to the NAND
before kicking the bucket. PROS - survives until the next boot, and could
even be analyzed in the kernel. CONS - more difficult to implement,
wastes NAND space, and write to NAND may be unreliable depending on the
state of the machine. Would require userland tools to read the log (we
could possibly use a debugfs hook).
3. Following a hang/crash, the software watchdog and oops engine could
invoke kdump which could then read the kernel log buffer (or even the
aforementioned memory block). For further bonus points, the kdump could
invoke a OFW interpretor living in the filesystem. PROS - doesn't involve
any new interfaces between kernel and user. CONS - we will have to learn
to kexec on watchdog (probably not hard), but then we would need to have a
kdump infrastructure in place. Execing an OFW binary may or may not be
more complex.
4. Final option - leave OFW resident and jump back to it following a hang
or oops and let OFW do its thing. PROS - we've wanted OFW resident for a
long time CONS - memory usage, difficulty of implementation.
As far as the logging is concerned, this fresh news out of LPC might be of
interest:
http://lkml.org/lkml/2008/9/19/275
And finally, the coreboot folks are also interested in a similar
arrangement, so I'm sure they would be interested in listening in on our
discussions.
--
Ticket URL: <http://dev.laptop.org/ticket/8566#comment:1>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list