[OLPC Networking] TCP is broken in mesh mode
Bill Mccormick
billmcc at nortel.com
Tue Jun 10 17:36:57 EDT 2008
if you have time to reproduce this, I would be interested in the device
forwarding table info.
if not I think we'll be looking at the same area in the next few days,
although we're more in a test mode than a fix mode right now
you can get the forwarding table using the different iwpriv commands
documented here:
http://wiki.laptop.org/go/Wireless_Driver_README
it's all mac based at this level, so we'd also need the mac addresses of
the devices involved.
best regards,
Bill
-----Original Message-----
From: devel-bounces at lists.laptop.org
[mailto:devel-bounces at lists.laptop.org] On Behalf Of Polychronis
Ypodimatopoulos
Sent: Tuesday, June 10, 2008 3:20 PM
To: bens at alum.mit.edu
Cc: olpc at collabora.co.uk; networking at lists.laptop.org; OLPC Developer's
List
Subject: Re: [OLPC Networking] TCP is broken in mesh mode
nice report.
Benjamin M. Schwartz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear Networking experts,
>
> I have been fighting for several months with the fact that invitations
> often seem not to work, when running on a serverless mesh. The
> symptoms are quite strange. If an invitation works once between two
> laptops, it continues to work between them reliably. If it fails
> once, it continues to fail between them consistently. Sometimes, in
> the same place, invitations will work on one mesh channel and not on
> another. The same two XOs may be reliably successful in a particular
> high-noise environment, and consistently fail in an area of virtual
> radio silence, as well as the reverse.
>
> Even when invitations fail, other presence information continues to
> flow correctly. Even activity sharing continues to work beautifully.
>
> With some help from Daf, we managed to get a tcpdump trace from two
> XOs exhibiting this behavior at 1CC. The dumps are attached to ticket
#6463.
> ~ What we saw is bizarre, but also consistent with the behavior in
the UI.
> ~ The invitations are unicast, implemented using TCP. When machine A
> sends an invitation to B, we see the following exchange:
>
> 1. A broadcasts an ARP request for B
> 2. B sees the ARP request and replies to A 3. A receives the ARP reply
> from B and sends a TCP SYN to B 4. B does not see the SYN packet (it
> does not appear in B's dump) 5. A retries a total of three times, but
> none of the SYN packets are seen by B.
> 3b. In parallel, A broadcasts a presence-info update with mDNS,
> indicating that it has shared the activity.
> 4b. B receives this broadcast, updates its presence-info cache, and
> even assigns B's XO icon a new location in the mesh view
>
> This behavior is fairly frightening. I have seen it occur in
> low-noise network environments with a total of 3 XOs, so I suspect a
> serious bug somewhere in the lowest levels of the network stack. Once
> this failure occurs, it is extremely reproducible. All subsequent
> invitations will continue to fail. I therefore suspect that the bug
> involves the driver or firmware reaching an invalid state and becoming
stuck there.
>
You have to keep in mind that the driver/firmware may very well have
bugs, but:
1) the driver does not differentiate between different TCP/IP packets
(but may wrongly differentiate between unicast and broadcast/multicast).
Try establishing a separate TCP/IP connection when invitations
reproducibly don't work.
2) the firmware (in terms of a route existing or not) does not
differentiate between frames. Try pinging the other node when
invitations reproducibly don't work.
> Given the variety of critical services that run over TCP, including
> the much-emphasized Read activity, I hope that people familiar with
> the driver and firmware will take a look at this bug.
>
> - --Ben Schwartz
>
> P.S. All this info is present at ticket #6463. I am writing about it
> here in an attempt to increase awareness.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhOz+oACgkQUJT6e6HFtqSVBQCeKPWmqeoKOzVv55JS/HTAgf1r
> bUYAoKCG+z1bBA+isc7Mun0VlQNGDars
> =4w83
> -----END PGP SIGNATURE-----
> _______________________________________________
> Networking mailing list
> Networking at lists.laptop.org
> http://lists.laptop.org/listinfo/networking
>
--
Polychronis Ypodimatopoulos
Graduate student
Viral Communications
MIT Media Lab
Tel: +1 (617) 459-6058
http://www.mit.edu/~ypod/
_______________________________________________
Devel mailing list
Devel at lists.laptop.org
http://lists.laptop.org/listinfo/devel
More information about the Devel
mailing list