Reason for the "one dot" hang found!
Daniel Drake
dsd at laptop.org
Thu Jun 10 15:53:53 EDT 2010
On 10 June 2010 10:58, Bernie Innocenti <bernie at codewiz.org> wrote:
> Hello,
>
> with the serial cable Richard gave me, I figured out what's causing a
> rare lockup during boot which has been riddling the XO-1 since when we
> moved to F11.
>
> The /etc/rc.sysinit script contains this line:
>
> # Sync waiting for storage.
> { rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1
>
> It gets executed while udev is loading modules in parallel. Apparently,
> something in the kernel ends up dead-locking on module load:
>
>
> 1 tty1 Ss+ 0:02 /sbin/init
> 945 ? Ss 0:00 /bin/sh -e -c ?runlevel --set S >/dev/null || true???/
> 950 ? S 0:00 \_ /bin/bash /etc/rc.d/rc.sysinit
> 1597 ? D 0:00 \_ modprobe scsi_wait_scan
I strongly doubt this is the issue. This is a very simple module.
Note your other blocked process:
> 1035 ? D< 0:00 /sbin/modprobe -b pci:v000011ABd00004102sv000011ABsd00
This one also has a lower process ID, suggesting that it was run first.
I suspect there is a crash/hang within this module, and at this point,
attempting to load any other module (scsi_wait_scan or otherwise) will
hang. Due to contention on a lock, corruption, a dead kernel thread,
or something like that.
My suggested next steps in diagnosis:
1. Identify which device is pci:v000011ABd00004102
Anyone can do this on any XO-1 with: lspci -vd 11ab:4102
I'm pretty sure its a part of the CAFE chip but I don't have an XO to check.
2. Look at dmesg at point of crash
Considering that you got a process tree I guess you can also run some
other commands at point of hang?
Run "dmesg" and capture output.
3. Capture kernel task dump at point of crash
echo t > /proc/sysrq-trigger
The task dump will appear in kernel logs (dmesg).
Daniel
More information about the Devel
mailing list