[OLPC Networking] TCP is broken in mesh mode
lathiat at bur.st
Wed Jun 11 01:01:50 EDT 2008
On 11/06/2008, at 3:20 AM, Polychronis Ypodimatopoulos wrote:
>> 1. A broadcasts an ARP request for B
>> 2. B sees the ARP request and replies to A
>> 3. A receives the ARP reply from B and sends a TCP SYN to B
>> 4. B does not see the SYN packet (it does not appear in B's dump)
>> 5. A retries a total of three times, but none of the SYN packets
>> are seen
>> by B.
>> 3b. In parallel, A broadcasts a presence-info update with mDNS,
>> that it has shared the activity.
>> 4b. B receives this broadcast, updates its presence-info cache, and
>> assigns B's XO icon a new location in the mesh view
>> This behavior is fairly frightening. I have seen it occur in low-
>> network environments with a total of 3 XOs, so I suspect a serious
>> somewhere in the lowest levels of the network stack. Once this
>> occurs, it is extremely reproducible. All subsequent invitations
>> continue to fail. I therefore suspect that the bug involves the
>> driver or
>> firmware reaching an invalid state and becoming stuck there.
> You have to keep in mind that the driver/firmware may very well have
> bugs, but:
> 1) the driver does not differentiate between different TCP/IP packets
> (but may wrongly differentiate between unicast and broadcast/
> Try establishing a separate TCP/IP connection when invitations
> reproducibly don't work.
> 2) the firmware (in terms of a route existing or not) does not
> differentiate between frames. Try pinging the other node when
> invitations reproducibly don't work.
Keep in mind however that the other traffic, i.e. discovering the
other XO is a multicast packet and therefor would be routed on the
mesh somewhat differently AIUI (its basically re-broadcasted by every
Thus I suspect the issue is the nodes simply have no direct
communication at all given the lack of replies to the ARP packets..
and any "other" tcp or ping is not going ot work if arps simply aren't
being replied to, as ARPs are in no way specific to the connection type.
Thus I suspect the issue is that multicast forward is working (which
seems perfectly sane mentally, given what I know as the difference
between how unicast and multicast work on the mesh) but the direct
node-node path is totally broken and this would be a relatively
I guess getting the mesh neighbour tables per Bill's email would be
the most useful step forward.
More information about the Networking