Reason for the "one dot" hang found!

Thu Jun 10 15:53:53 EDT 2010

On 10 June 2010 10:58, Bernie Innocenti <bernie at codewiz.org> wrote:
> Hello,
>
> with the serial cable Richard gave me, I figured out what's causing a
> rare lockup during boot which has been riddling the XO-1 since when we
> moved to F11.
>
> The /etc/rc.sysinit script contains this line:
>
>  # Sync waiting for storage.
>  { rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod  scsi_wait_scan ; } >/dev/null 2>&1
>
> It gets executed while udev is loading modules in parallel. Apparently,
> something in the kernel ends up dead-locking on module load:
>
>
>   1 tty1     Ss+    0:02 /sbin/init
>  945 ?        Ss     0:00 /bin/sh -e -c ?runlevel --set S >/dev/null || true???/
>  950 ?        S      0:00  \_ /bin/bash /etc/rc.d/rc.sysinit
> 1597 ?        D      0:00      \_ modprobe scsi_wait_scan

I strongly doubt this is the issue. This is a very simple module.

Note your other blocked process:

> 1035 ?        D<     0:00 /sbin/modprobe -b pci:v000011ABd00004102sv000011ABsd00

This one also has a lower process ID, suggesting that it was run first.

I suspect there is a crash/hang within this module, and at this point,
attempting to load any other module (scsi_wait_scan or otherwise) will
hang. Due to contention on a lock, corruption, a dead kernel thread,
or something like that.

My suggested next steps in diagnosis:
 1. Identify which device is pci:v000011ABd00004102
Anyone can do this on any XO-1 with: lspci -vd 11ab:4102
I'm pretty sure its a part of the CAFE chip but I don't have an XO to check.

 2. Look at dmesg at point of crash
Considering that you got a process tree I guess you can also run some
other commands at point of hang?
Run "dmesg" and capture output.

3. Capture kernel task dump at point of crash
echo t > /proc/sysrq-trigger
The task dump will appear in kernel logs (dmesg).

Daniel