#8566 HIGH 9.1.0: Improved crash and hang diagnosis tools.

Zarro Boogs per Child bugtracker at laptop.org
Mon Sep 22 12:33:54 EDT 2008


#8566: Improved crash and hang diagnosis tools.
----------------------------+-----------------------------------------------
   Reporter:  mstone        |       Owner:  JordanCrouse 
       Type:  defect        |      Status:  new          
   Priority:  high          |   Milestone:  9.1.0        
  Component:  not assigned  |     Version:  not specified
 Resolution:                |    Keywords:               
Next_action:  design        |    Verified:  0            
  Blockedby:                |    Blocking:               
----------------------------+-----------------------------------------------

Comment(by JordanCrouse):

 Here are a few options off the top of my head:

 1.  Use a reserved chunk of memory that both the firmware and the kernel
 understand and respect.  Kernel log messages will go into that chunk of
 memory in a circular buffer fashion.  When the system crashes or hangs, a
 watchdog timer can reboot the system, at which time the user can use th
 firmware to read the message log.   The log should survive a brief reboot,
 assuming that the firmware respects the chunk of memory and doesn't clear
 it during RAM init.  PROS - fairly easy to implement, all we need is
 communication between OFW and the kernel and the bits in the kernel to
 write to the log buffer.  CONS - The logs will not survive anything more
 then a brief power outage.

 2.  Second verse, same as the first except this time the logs will be
 written to the flash before rebooting.  This can be accomplished by using
 a software watchdog and kernel oops mechanism to write the log to the NAND
 before kicking the bucket.  PROS - survives until the next boot, and could
 even be analyzed in the kernel.  CONS - more difficult to implement,
 wastes NAND space, and write to NAND may be unreliable depending on the
 state of the machine.  Would require userland tools to read the log (we
 could possibly use a debugfs hook).

 3.  Following a hang/crash, the software watchdog and oops engine could
 invoke kdump which could then read the kernel log buffer (or even the
 aforementioned memory block).  For further bonus points, the kdump could
 invoke a OFW interpretor living in the filesystem.  PROS - doesn't involve
 any new interfaces between kernel and user.  CONS - we will have to learn
 to kexec on watchdog (probably not hard), but then we would need to have a
 kdump infrastructure in place.  Execing an OFW binary may or may not be
 more complex.

 4.  Final option - leave OFW resident and jump back to it following a hang
 or oops and let OFW do its thing.  PROS - we've wanted OFW resident for a
 long time CONS - memory usage, difficulty of implementation.

 As far as the logging is concerned, this fresh news out of LPC might be of
 interest:
 http://lkml.org/lkml/2008/9/19/275

 And finally, the coreboot folks are also interested in a similar
 arrangement, so I'm sure they would be interested in listening in on our
 discussions.

-- 
Ticket URL: <http://dev.laptop.org/ticket/8566#comment:1>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system


More information about the Bugs mailing list