Reason for the "one dot" hang found!
bernie at codewiz.org
Thu Jun 10 17:32:52 EDT 2010
El Thu, 10-06-2010 a las 16:53 -0300, Daniel Drake escribió:
> > 1 tty1 Ss+ 0:02 /sbin/init
> > 945 ? Ss 0:00 /bin/sh -e -c ?runlevel --set S >/dev/null || true???/
> > 950 ? S 0:00 \_ /bin/bash /etc/rc.d/rc.sysinit
> > 1597 ? D 0:00 \_ modprobe scsi_wait_scan
> I strongly doubt this is the issue. This is a very simple module.
> Note your other blocked process:
> > 1035 ? D< 0:00 /sbin/modprobe -b pci:v000011ABd00004102sv000011ABsd00
> This one also has a lower process ID, suggesting that it was run first.
> I suspect there is a crash/hang within this module, and at this point,
> attempting to load any other module (scsi_wait_scan or otherwise) will
> hang. Due to contention on a lock, corruption, a dead kernel thread,
> or something like that.
Ok, makes sense. If one module hangs during init, any subsequent
invocation of modprobe would also hang.
> My suggested next steps in diagnosis:
> 1. Identify which device is pci:v000011ABd00004102
> Anyone can do this on any XO-1 with: lspci -vd 11ab:4102
> I'm pretty sure its a part of the CAFE chip but I don't have an XO to check.
It's the camera controller. Hence, the other module being loaded must be
Looking at the initialization of cafe_ccic, there seems to be a
complicated dance of mutexes and spin locks, plus a kernel thread and a
bunch of sleeps. All the ingredients for a good deadlock are present :-)
Jonathan, can you make your best guess?
> 2. Look at dmesg at point of crash
> Considering that you got a process tree I guess you can also run some
> other commands at point of hang?
> Run "dmesg" and capture output.
I did, but there was nothing interesting in dmesg, which is what I would
expect from a pure locking bug. Moreover, CONFIG_DEBUG_MUTEXES is turned
Perhaps interestingly, on regular boots, I can see some psmouse
initialization messages intermixed with the cafe_ccic ones.
> 3. Capture kernel task dump at point of crash
> echo t > /proc/sysrq-trigger
> The task dump will appear in kernel logs (dmesg).
Ok, I'll do it as soon as I see it again.
BTW: this bug seems to be easier to trigger by forcing a shutdown while
some data is being written to disk.
// Bernie Innocenti - http://codewiz.org/
\X/ Sugar Labs - http://sugarlabs.org/
More information about the Devel