TCP is broken in mesh mode

Erik Garrison erik at laptop.org
Tue Jun 10 15:18:59 EDT 2008


On Tue, Jun 10, 2008 at 03:03:06PM -0400, Benjamin M. Schwartz wrote:
> ...
> 
> With some help from Daf, we managed to get a tcpdump trace from two XOs
> exhibiting this behavior at 1CC.  The dumps are attached to ticket #6463.
> ~  What we saw is bizarre, but also consistent with the behavior in the UI.
> ~ The invitations are unicast, implemented using TCP.  When machine A sends
> an invitation to B, we see the following exchange:
> 
> 1. A broadcasts an ARP request for B
> 2. B sees the ARP request and replies to A
> 3. A receives the ARP reply from B and sends a TCP SYN to B
> 4. B does not see the SYN packet (it does not appear in B's dump)
> 5. A retries a total of three times, but none of the SYN packets are seen
> by B.
> 3b. In parallel, A broadcasts a presence-info update with mDNS, indicating
> that it has shared the activity.
> 4b. B receives this broadcast, updates its presence-info cache, and even
> assigns B's XO icon a new location in the mesh view
> 
> This behavior is fairly frightening.  I have seen it occur in low-noise
> network environments with a total of 3 XOs, so I suspect a serious bug
> somewhere in the lowest levels of the network stack.  Once this failure
> occurs, it is extremely reproducible.  All subsequent invitations will
> continue to fail.  I therefore suspect that the bug involves the driver or
> firmware reaching an invalid state and becoming stuck there.
> 

If this is the case we could expect the symptom to appear elsewhere.  We
could write a test script which simply attempts to negotiate the steps
you list above and reports failure (we could establish failure
automatically by using an out-of-band link e.g. the presence server?).

> Given the variety of critical services that run over TCP, including the
> much-emphasized Read activity, I hope that people familiar with the driver
> and firmware will take a look at this bug.
> 

Thank you for writing this email and re-bringing it to the attention of
the list.

Collaboration is something which users *expect* to work on these
laptops.  Most appear to be very disappointed when they realize it does
not.  In many cases it does not work, and as you note it fails for
reasons we don't completely understand.

I would like to devote more of my time to this problem.  How should we
proceed?

Erik



More information about the Devel mailing list