[OLPC Networking] TCP is broken in mesh mode

Bill Mccormick billmcc at nortel.com
Tue Jun 10 17:36:57 EDT 2008


if you have time to reproduce this, I would be interested in the device
forwarding table info.

if not I think we'll be looking at the same area in the next few days,
although we're more in a test mode than a fix mode right now

you can get the forwarding table using the different iwpriv commands
documented here:

http://wiki.laptop.org/go/Wireless_Driver_README

it's all mac based at this level, so we'd also need the mac addresses of
the devices involved.

best regards,

Bill

 

-----Original Message-----
From: devel-bounces at lists.laptop.org
[mailto:devel-bounces at lists.laptop.org] On Behalf Of Polychronis
Ypodimatopoulos
Sent: Tuesday, June 10, 2008 3:20 PM
To: bens at alum.mit.edu
Cc: olpc at collabora.co.uk; networking at lists.laptop.org; OLPC Developer's
List
Subject: Re: [OLPC Networking] TCP is broken in mesh mode

nice report.

Benjamin M. Schwartz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear Networking experts,
>
> I have been fighting for several months with the fact that invitations

> often seem not to work, when running on a serverless mesh.  The 
> symptoms are quite strange.  If an invitation works once between two 
> laptops, it continues to work between them reliably.  If it fails 
> once, it continues to fail between them consistently. Sometimes, in 
> the same place, invitations will work on one mesh channel and not on 
> another.  The same two XOs may be reliably successful in a particular 
> high-noise environment, and consistently fail in an area of virtual 
> radio silence, as well as the reverse.
>
> Even when invitations fail, other presence information continues to 
> flow correctly.  Even activity sharing continues to work beautifully.
>
> With some help from Daf, we managed to get a tcpdump trace from two 
> XOs exhibiting this behavior at 1CC.  The dumps are attached to ticket
#6463.
> ~  What we saw is bizarre, but also consistent with the behavior in
the UI.
> ~ The invitations are unicast, implemented using TCP.  When machine A 
> sends an invitation to B, we see the following exchange:
>
> 1. A broadcasts an ARP request for B
> 2. B sees the ARP request and replies to A 3. A receives the ARP reply

> from B and sends a TCP SYN to B 4. B does not see the SYN packet (it 
> does not appear in B's dump) 5. A retries a total of three times, but 
> none of the SYN packets are seen by B.
> 3b. In parallel, A broadcasts a presence-info update with mDNS, 
> indicating that it has shared the activity.
> 4b. B receives this broadcast, updates its presence-info cache, and 
> even assigns B's XO icon a new location in the mesh view
>
> This behavior is fairly frightening.  I have seen it occur in 
> low-noise network environments with a total of 3 XOs, so I suspect a 
> serious bug somewhere in the lowest levels of the network stack.  Once

> this failure occurs, it is extremely reproducible.  All subsequent 
> invitations will continue to fail.  I therefore suspect that the bug 
> involves the driver or firmware reaching an invalid state and becoming
stuck there.
>   


You have to keep in mind that the driver/firmware may very well have
bugs, but:

1) the driver does not differentiate between different TCP/IP packets
(but may wrongly differentiate between unicast and broadcast/multicast).

Try establishing a separate TCP/IP connection when invitations
reproducibly don't work.

2) the firmware (in terms of a route existing or not) does not
differentiate between frames. Try pinging the other node when
invitations reproducibly don't work.

> Given the variety of critical services that run over TCP, including 
> the much-emphasized Read activity, I hope that people familiar with 
> the driver and firmware will take a look at this bug.
>
> - --Ben Schwartz
>
> P.S. All this info is present at ticket #6463.  I am writing about it 
> here in an attempt to increase awareness.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhOz+oACgkQUJT6e6HFtqSVBQCeKPWmqeoKOzVv55JS/HTAgf1r
> bUYAoKCG+z1bBA+isc7Mun0VlQNGDars
> =4w83
> -----END PGP SIGNATURE-----
> _______________________________________________
> Networking mailing list
> Networking at lists.laptop.org
> http://lists.laptop.org/listinfo/networking
>   

--
Polychronis Ypodimatopoulos
Graduate student
Viral Communications
MIT Media Lab
Tel: +1 (617) 459-6058
http://www.mit.edu/~ypod/

_______________________________________________
Devel mailing list
Devel at lists.laptop.org
http://lists.laptop.org/listinfo/devel


More information about the Networking mailing list