isolating cause of ticket 5848

Bill Mccormick billmcc at nortel.com
Tue Jun 10 17:25:09 EDT 2008


Hey guys,

I've tracked down some of the code that's causing this problem.  (the
syslog file is invaluable for debugging nm problems in case you have to
do this again).   NM is waiting for a netlink callback, specifically in
nm-device-802-11-mesh-olpc.c on lines 291-307.   when NM doesn't get the
expected callback a timer (called the association timer) expires and NM
stops trying to setup the P2P mesh.

<snip>
		if (iwe->cmd == SIOCGIWAP) {
			addr = iwe->u.ap_addr.sa_data;
			if (   !memcmp (addr, badaddr1, ETH_ALEN)
			    || !memcmp (addr, badaddr2, ETH_ALEN)
			    || !memcmp (addr, badaddr3, ETH_ALEN)) {
				/* disassociated */
			} else {
				/* associated */
				GSource * source = g_idle_source_new ();
				if (source) {
nm_info ("%s: Got association; scheduling association handler",
nm_device_get_iface (NM_DEVICE (self)));
					g_object_ref (self);
					g_source_set_priority (source,
G_PRIORITY_HIGH_IDLE);
					g_source_set_callback (source,
handle_association_event, self, NULL);
					g_source_attach (source,
nm_device_get_main_context (NM_DEVICE (self)));
					g_source_unref (source);
				}
			}
</snip>

handle_association_event() cancels the association timer, so if this
SIOCGIWAP message isn't received, or is received with an 'invalid' MAC,
then the timer never gets cancelled and NM gives up on the mesh setup.

Looking at the wireless.h header file, it looks like SIOCGIWAP is
normally used to get access point MAC addresses.   In this guess, I
think the NM is expecting to get the MAC address of msh0.

Now it gets kinda confusing here.   The SIOCGIWAP isn't actually used in
an ioctl() function, rather this is a message delivered via netlink and
SIOCGIWAP is the value of the iw_event.cmd field (see wireless.h again).

Javier, could you advise what this message is used for on the mesh
interface?

I suspect that we need to build a debug version of networkmanager with a
couple of extra logs enabled to isolate this to the driver versus NM.
But it sounds like nobody can reproduce it with the later loads with the
newer driver version, so maybe we could attribute it to one of the
driver fixes?   If the right USB message from the nw processor
containing this info got lost, it would cause these symptoms and I know
the USB flow control seems to work a lot better in the 706 build.

Can you advise where I would find a view of the olpc specific content in
libertas?   I'm still getting used to the olpc source management...

And if anyone on the list is still experiencing this problem, let me
know...

thanks,

Bill McCormick
Open innovation lab
Nortel
ESN 393-6298
External (613) 763-6298 

 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: good-nm-messages.txt
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080610/90756bcf/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 5848-nm-messages.txt
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080610/90756bcf/attachment-0001.txt>


More information about the Devel mailing list