The test showed that the effect is not a result of a network failure. <br>It occurs naturally, every time a new host arrives, while at the same time another host appears dead.<br>"Dead" can also mean a host that simply disconnected fro the channel by user intervention.
<br><br>The best and simplest way to recreate the effect in any environment(noisy or not) is to:<br>1.Connect successfully 3 XOs in the same mesh.<br>2.Move successfully XO1,XO2 to another channel., and verify the show as "failed" when running "avahi-browse" in XO3
<br>3.Reconnect at the same time XO1,XO2 to the initial channel.<br>4.While the XOs are trying to connect(30sec) check they still show are "Failed" when running "avahi-browse" in XO3<br>5.Observe the screen in XO3: the icons of XO1,XO2 will jump almost at the same time.
<br><br>To my best understanding,<br>It is not related to a noisy envirnment<br>Does not require a large number of laptops<br>Can be recreated in 100% of the times you try the above.<br>I believe that if the emulator you operate, uses the proper timeouts, you will see the effect
<br><br>yani<br><br><div class="gmail_quote">On Dec 14, 2007 4:31 AM, Sjoerd Simons <<a href="mailto:sjoerd@luon.net">sjoerd@luon.net</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">On Thu, Dec 13, 2007 at 11:18:01PM -0500, Giannis Galanis wrote:<br>> THE TEST:<br>> 6 XOs connected to channel 11, with forwarding tables blinded only to them<br>> selves, so no other element in the mesh can interfere.
<br>><br>> The cache list was scanned continuously on all XOs using a script<br>><br>> If all XOs remained idle, they all showed reliably to each other mesh view.<br>> Every 5-10 mins an XO showed as dead in some other XOs scns, but this was
<br>> shortly recovered, and there was no visual effect in the mesh view.<br><br></div>Could you provide a packet trace of one of these XO's in this test? (Install<br>tcpdump and run ``tcpdump -i msh0 -n -s 1500 -w <some filename>''.
<br><br>I'm surprised that with only 6 laptops you hit this case so often. Ofcourse the<br>RF environment in the OLPC is quite crowded, which could trigger this.<br><br>Can you also run: <a href="http://people.collabora.co.uk/%7Esjoerd/mc-test.py" target="_blank">
http://people.collabora.co.uk/~sjoerd/mc-test.py</a><br>Run it as ``python mc-test.py server'' on one machine and just ``python<br>mc-test.py'' on the others. This should give you an indication of the amount of
<br>multicast packet loss.. Which can help me to recreate a comparable setting<br>here by using netem.<br><div class="Ih2E3d"><br>> If you switched an XO manually to another channel, again it showed "dead" in
<br>> all others. If you reconnected to channel 11, there is again no effect in<br>> the mesh view.<br>> If you never reconnected, in about 10-15 minutes the entry is deleted, and<br>> the corresponding XO icon dissapeared from the view.
<br>><br>> Therefore, it is common and expected for XOs to show as "dead" in the Avahi<br>> cache for some time for some time.<br>><br>> THE BUG:<br>> IF a new XO appears(a message is received through Avahi),
<br>> WHILE there are 1 or more XOs in the cache that are reported as "dead"<br>> THEN Avahi "crashes" temporarily and the cache CLEARS.<br>><br>> At this point ALL XOs that are listed as dead instantly disappear from the
<br>> mesh view.<br><br></div>Interesting. Could you file an trac bug with this info, with me cc'd ?<br><br> Sjoerd<br><font color="#888888">--<br>Everything should be made as simple as possible, but not simpler.<br>
-- Albert Einstein<br></font><div><div></div><div class="Wj3C7c">_______________________________________________<br>Devel mailing list<br><a href="mailto:Devel@lists.laptop.org">Devel@lists.laptop.org</a><br>
<a href="http://lists.laptop.org/listinfo/devel" target="_blank">http://lists.laptop.org/listinfo/devel</a><br></div></div></blockquote></div><br>