[OLPC Networking] TCP is broken in mesh mode

Trent Lloyd lathiat at bur.st
Wed Jun 11 01:01:50 EDT 2008


Hi All,

On 11/06/2008, at 3:20 AM, Polychronis Ypodimatopoulos wrote:

>>

<snip>

>>
>> 1. A broadcasts an ARP request for B
>> 2. B sees the ARP request and replies to A
>> 3. A receives the ARP reply from B and sends a TCP SYN to B
>> 4. B does not see the SYN packet (it does not appear in B's dump)
>> 5. A retries a total of three times, but none of the SYN packets  
>> are seen
>> by B.
>> 3b. In parallel, A broadcasts a presence-info update with mDNS,  
>> indicating
>> that it has shared the activity.
>> 4b. B receives this broadcast, updates its presence-info cache, and  
>> even
>> assigns B's XO icon a new location in the mesh view
>>
>> This behavior is fairly frightening.  I have seen it occur in low- 
>> noise
>> network environments with a total of 3 XOs, so I suspect a serious  
>> bug
>> somewhere in the lowest levels of the network stack.  Once this  
>> failure
>> occurs, it is extremely reproducible.  All subsequent invitations  
>> will
>> continue to fail.  I therefore suspect that the bug involves the  
>> driver or
>> firmware reaching an invalid state and becoming stuck there.
>>
>
>
> You have to keep in mind that the driver/firmware may very well have
> bugs, but:
>
> 1) the driver does not differentiate between different TCP/IP packets
> (but may wrongly differentiate between unicast and broadcast/ 
> multicast).
> Try establishing a separate TCP/IP connection when invitations
> reproducibly don't work.
>
> 2) the firmware (in terms of a route existing or not) does not
> differentiate between frames. Try pinging the other node when
> invitations reproducibly don't work.

Keep in mind however that the other traffic, i.e. discovering the  
other XO is a multicast packet and therefor would be routed on the  
mesh somewhat differently AIUI (its basically re-broadcasted by every  
node?)

Thus I suspect the issue is the nodes simply have no direct  
communication at all given the lack of replies to the ARP packets..  
and any "other" tcp or ping is not going ot work if arps simply aren't  
being replied to, as ARPs are in no way specific to the connection type.

Thus I suspect the issue is that multicast forward is working (which  
seems perfectly sane mentally, given what I know as the difference  
between how unicast and multicast work on the mesh) but the direct  
node-node path is totally broken and this would be a relatively  
pointless excercise.

I guess getting the mesh neighbour tables per Bill's email would be  
the most useful step forward.

Regards,
Trent


More information about the Networking mailing list