[OLPC/Paraguay] Debugging NetworkManager-0.7.2.995 + Power Management

Bernie Innocenti bernie at codewiz.org
Tue Mar 30 13:07:55 EDT 2010


On Tue, 2010-03-30 at 09:16 -0700, Dan Williams wrote:
> Yeah, I haven't been able to reproduce that, but I only have an XO1.5
> with me (which doesn't have mesh), and I think the issue is timing
> related.  So it's good that you can reproduce it. That gives us a chance
> to fix it.

We could definitely reproduce the issue also on the XO-1.5, using os115
from OLPC:

 http://build.laptop.org/10.2.0/os115

Just leave the automatic power management on and let the system go to
sleep a few times.


> So what is going on is that the mesh device and the wifi device are
> obviously the "same" device, because they really share the same silicon.
> So we need to make sure the mesh device knows about it's companion wifi
> device.
> 
> Can I get some /var/log/messages logging from NetworkManager over a
> suspend/resume cycle?  I'd like to see if the kernel removes either the
> wifi or the mesh device and then adds it back after resume, or whether
> the device sticks around throughout the entire cycle.

>From what I've seen in gdb after the segfault, it looked very much like
the is_companinion() callback had being invoked an object that has been
already freed. The GObject class was zeroed out.

I'll wait for Martin (tch) to come back from lunch to send you the full
log.


> This segfault indicates one of two things:
> 
> 1) a reference counting issue; there's a missing g_object_ref()
> somewhere, which means that the Mesh object is getting unreffed one too
> many times, leading to its destruction
>
> 2) the kernel is removing the underlying device and there's a missing
> idle, timeout, or signal clear command.  The mesh device listens for a
> number of signals of other objects, but when the mesh object gets
> destroyed, we need to remember to stop listening for those signals or
> we'll end up in the signal handler after the object is destroyed
> ("use-after-free").

We looked for paths where the signal is being attached and removed, and
it *looked* like removal was being done correctly on disposal. Tch added
some debug output to track the life-cycle of objects.


> Can I also get a backtrace of the crash?  Bernie's backtrace didn't have
> debugging info, which your clearly have.

Tch will follow up with the complete backtrace, he has a debugging
environment where NM was built from sources.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/




More information about the Devel mailing list