I had several tests related to the xmas tree effect we see in the mesh view. <br><br>The effect is that some times XOs disappear + reappear to the same or different position, or simply disappear. More usually it happens for many XOs simultaneously.
<br><br>The results i have, clearly indicate that this is an issue an the Avahi daemon, which is used by the Salut telepathy service. The sugar interface displayes the information it receives from salut very reliably. This means that when a host dissapear from the avahi's host list, it vanished instantly from the mesh view, and the same when a new host arrives.
<br><br>The Avahi deamon runs below Salut and keeps receives information from other hosts in the network which also run Avahi deamon.<br>It keeps a local cache with the recent hosts. <br>At regular intervals(of 1-2 mins i think), it checks whether the hosts in the cache are alive. If not, they are recorded as "failed"
<br>The above check can be invoked by "avahi-browse -t -r _presence._tcp" continuously(instead of waiting for 1-2mins)<br>After a certain timeout, a failed entry(dead host) will disappear from the cache, and instantly it will disappear from the mesh view.
<br><br>This timeouts is pretty long(several minutes), so a host(XO) has the chance to become alive again with no effect on the mesh view.<br>This can occur when:<br>a. the XO's avahi packets dont get through due to high mesh traffic. In this case the other XOs might either see is as alive, or dead according to the conditions.
<br>b.the XO's deliberately moved to another channel, or anyway disconnected. In that case, all othes XOs will see it as dead<br>From a client's point of view, the two cases are treated almost the same.<br><br>THE TEST:
<br>6 XOs connected to channel 11, with forwarding tables blinded only to them selves, so no other element in the mesh can interfere.<br><br>The cache list was scanned continuously on all XOs using a script<br><br>If all XOs remained idle, they all showed reliably to each other mesh view. Every 5-10 mins an XO showed as dead in some other XOs scns, but this was shortly recovered, and there was no visual effect in the mesh view.
<br><br>If you switched an XO manually to another channel, again it showed "dead" in all others. If you reconnected to channel 11, there is again no effect in the mesh view.<br>If you never reconnected, in about 10-15 minutes the entry is deleted, and the corresponding XO icon dissapeared from the view.
<br><br>Therefore, it is common and expected for XOs to show as "dead" in the Avahi cache for some time for some time.<br><br>THE BUG:<br>IF a new XO appears(a message is received through Avahi), <br>WHILE there are 1 or more XOs in the cache that are reported as "dead"
<br>THEN Avahi "crashes" temporarily and the cache CLEARS.<br><br>At this point ALL XOs that are listed as dead instantly disappear from the mesh view.<br>But, of course, some of the "dead" XOs are expected to re-appear shortly. Specially those that are still in the same mesh channel, but merely failed to transmit its avahi packets due to traffic load.
<br><br>Note that if there is only 1 XO that looks dead, but returns, everything is normal.<br>But, if there are 2,3.. XOs that look dead, when 1 returns, then:<br>a. all(the dead ones) disappear from the view<br>b. the 1 that returned will reappear right after in probably a different position.
i.e. it will "jump"<br><br>The avahi-browse command scans realtime the network(i.e. sends requests for all hosts in its cache list) and runs for a several seconds. If the above situation occurs, it freezes(this is what i meant by "crashes"). When it is restarted the cache is cleared from previously dead hosts.
<br><br>A typical situation that the "xmas tree effect" occurs:<br>20 XOs are running salut in channel 1. This incuded XOs conencted to medialab AP, schoolserver, linklocal.<br>XOs leave the channel continuously.
<br>Concurrently, some connected XOs appear dead for 1 minute or so, and reappear after short time.<br><br>Assume that at some point 5 XOs have either really left, or "seem dead" anyway<br><br>At some point 2 of these XOs are reconnected at the same time to the mesh channel by someone in the office.
<br>The 2 XOs will "jump" to a different position, whereas the other 3 will simply vanish<br><br>The way I see it, there is very clear/narrow/specific bug in handling the cache by the avahi daemon,<br>when new hosts + dead hosts coexist.
<br><br>I hope the tests have cleared the picture alot<br><br>yani<br>