The reason we see icons flashing here and there in the mesh view.. i.e. "xmas tree effect"

John Watlington wad at laptop.org
Fri Dec 14 00:40:55 EST 2007


What worries me most about this is the revelation that we continue to  
rely on mDNS
when connected to internet infrastructure.  When in the presence of a  
school server,
(or connected to a jabber server), mDNS should be shut down.    
Otherwise we risk
a network meltdown....

wad

On Dec 13, 2007, at 11:18 PM, Giannis Galanis wrote:

> I had several tests related to the xmas tree effect we see in the  
> mesh view.
>
> The effect is that some times XOs disappear + reappear to the same  
> or different position, or simply disappear. More usually it happens  
> for many XOs simultaneously.
>
> The results i have, clearly indicate that this is an issue an the  
> Avahi daemon, which is used by the Salut telepathy service. The  
> sugar interface displayes the information it receives from salut  
> very reliably. This means that when a host dissapear from the  
> avahi's host list, it vanished instantly from the mesh view, and  
> the same when a new host arrives.
>
> The Avahi deamon runs below Salut and keeps receives information  
> from other hosts in the network which also run Avahi deamon.
> It keeps a local cache with the recent hosts.
> At regular intervals(of 1-2 mins i think), it checks whether the  
> hosts in the cache are alive. If not, they are recorded as "failed"
> The above check can be invoked by "avahi-browse -t -r  
> _presence._tcp"  continuously(instead of waiting for 1-2mins)
> After a certain timeout, a failed entry(dead host) will disappear  
> from the cache, and instantly it will disappear from the mesh view.
>
> This timeouts is pretty long(several minutes), so a host(XO) has  
> the chance to become alive again with no effect on the mesh view.
> This can occur when:
> a. the XO's avahi packets dont get through due to high mesh  
> traffic. In this case the other XOs might either see is as alive,  
> or dead according to the conditions.
> b.the XO's deliberately moved to another channel, or anyway  
> disconnected. In that case, all othes XOs will see it as dead
> From a client's point of view, the two cases are treated almost the  
> same.
>
> THE TEST:
> 6 XOs connected to channel 11, with forwarding tables blinded only  
> to them selves, so no other element in the mesh can interfere.
>
> The cache list was scanned continuously on all XOs using a script
>
> If  all XOs remained idle, they all showed reliably to each other  
> mesh view. Every 5-10 mins an XO showed as dead in some other XOs  
> scns, but this was shortly recovered, and there was no visual  
> effect in the mesh view.
>
> If you switched an XO manually to another channel, again it showed  
> "dead" in all others. If you reconnected to channel 11, there is  
> again no effect in the mesh view.
> If you never reconnected, in about 10-15 minutes the entry is  
> deleted, and the corresponding XO icon dissapeared from the view.
>
> Therefore, it is common and expected for XOs to show as "dead" in  
> the Avahi cache for some time for some time.
>
> THE BUG:
> IF a new XO appears(a message is received through Avahi),
> WHILE there are 1 or more XOs in the cache that are reported as "dead"
> THEN Avahi "crashes" temporarily and the cache CLEARS.
>
> At this point ALL XOs that are listed as dead instantly disappear  
> from the mesh view.
> But, of course, some of the "dead" XOs are expected to re-appear  
> shortly. Specially those that are still in the same mesh channel,  
> but merely failed to transmit its avahi packets due to traffic load.
>
> Note that if there is only 1 XO that looks dead, but returns,  
> everything is normal.
> But, if there are 2,3.. XOs that look dead, when 1 returns, then:
> a. all(the dead ones) disappear from the view
> b. the 1 that returned will reappear right after in probably a  
> different position. i.e. it will "jump"
>
> The avahi-browse command scans realtime the network(i.e. sends  
> requests for all hosts in its cache list) and runs for a several  
> seconds. If the above situation occurs, it freezes(this is what i  
> meant by "crashes"). When it is restarted the cache is cleared from  
> previously dead hosts.
>
> A typical situation that the "xmas tree effect" occurs:
> 20 XOs are running salut in channel 1. This incuded XOs conencted  
> to medialab AP, schoolserver, linklocal.
> XOs leave the channel continuously.
> Concurrently, some connected XOs appear dead for 1 minute or so,  
> and reappear after short time.
>
> Assume that at some point 5 XOs have either really left, or "seem  
> dead" anyway
>
> At some point 2 of these XOs are reconnected at the same time to  
> the mesh channel by someone in the office.
> The 2 XOs will "jump" to a different position, whereas the other 3  
> will simply vanish
>
> The way I see it, there is very clear/narrow/specific bug in  
> handling the cache by the avahi daemon,
> when new hosts + dead hosts coexist.
>
> I hope the tests have cleared the picture alot
>
> yani
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel




More information about the Devel mailing list