The reason we see icons flashing here and there in the mesh view.. i.e. "xmas tree effect"
John Watlington
wad at laptop.org
Fri Dec 14 00:40:55 EST 2007
What worries me most about this is the revelation that we continue to
rely on mDNS
when connected to internet infrastructure. When in the presence of a
school server,
(or connected to a jabber server), mDNS should be shut down.
Otherwise we risk
a network meltdown....
wad
On Dec 13, 2007, at 11:18 PM, Giannis Galanis wrote:
> I had several tests related to the xmas tree effect we see in the
> mesh view.
>
> The effect is that some times XOs disappear + reappear to the same
> or different position, or simply disappear. More usually it happens
> for many XOs simultaneously.
>
> The results i have, clearly indicate that this is an issue an the
> Avahi daemon, which is used by the Salut telepathy service. The
> sugar interface displayes the information it receives from salut
> very reliably. This means that when a host dissapear from the
> avahi's host list, it vanished instantly from the mesh view, and
> the same when a new host arrives.
>
> The Avahi deamon runs below Salut and keeps receives information
> from other hosts in the network which also run Avahi deamon.
> It keeps a local cache with the recent hosts.
> At regular intervals(of 1-2 mins i think), it checks whether the
> hosts in the cache are alive. If not, they are recorded as "failed"
> The above check can be invoked by "avahi-browse -t -r
> _presence._tcp" continuously(instead of waiting for 1-2mins)
> After a certain timeout, a failed entry(dead host) will disappear
> from the cache, and instantly it will disappear from the mesh view.
>
> This timeouts is pretty long(several minutes), so a host(XO) has
> the chance to become alive again with no effect on the mesh view.
> This can occur when:
> a. the XO's avahi packets dont get through due to high mesh
> traffic. In this case the other XOs might either see is as alive,
> or dead according to the conditions.
> b.the XO's deliberately moved to another channel, or anyway
> disconnected. In that case, all othes XOs will see it as dead
> From a client's point of view, the two cases are treated almost the
> same.
>
> THE TEST:
> 6 XOs connected to channel 11, with forwarding tables blinded only
> to them selves, so no other element in the mesh can interfere.
>
> The cache list was scanned continuously on all XOs using a script
>
> If all XOs remained idle, they all showed reliably to each other
> mesh view. Every 5-10 mins an XO showed as dead in some other XOs
> scns, but this was shortly recovered, and there was no visual
> effect in the mesh view.
>
> If you switched an XO manually to another channel, again it showed
> "dead" in all others. If you reconnected to channel 11, there is
> again no effect in the mesh view.
> If you never reconnected, in about 10-15 minutes the entry is
> deleted, and the corresponding XO icon dissapeared from the view.
>
> Therefore, it is common and expected for XOs to show as "dead" in
> the Avahi cache for some time for some time.
>
> THE BUG:
> IF a new XO appears(a message is received through Avahi),
> WHILE there are 1 or more XOs in the cache that are reported as "dead"
> THEN Avahi "crashes" temporarily and the cache CLEARS.
>
> At this point ALL XOs that are listed as dead instantly disappear
> from the mesh view.
> But, of course, some of the "dead" XOs are expected to re-appear
> shortly. Specially those that are still in the same mesh channel,
> but merely failed to transmit its avahi packets due to traffic load.
>
> Note that if there is only 1 XO that looks dead, but returns,
> everything is normal.
> But, if there are 2,3.. XOs that look dead, when 1 returns, then:
> a. all(the dead ones) disappear from the view
> b. the 1 that returned will reappear right after in probably a
> different position. i.e. it will "jump"
>
> The avahi-browse command scans realtime the network(i.e. sends
> requests for all hosts in its cache list) and runs for a several
> seconds. If the above situation occurs, it freezes(this is what i
> meant by "crashes"). When it is restarted the cache is cleared from
> previously dead hosts.
>
> A typical situation that the "xmas tree effect" occurs:
> 20 XOs are running salut in channel 1. This incuded XOs conencted
> to medialab AP, schoolserver, linklocal.
> XOs leave the channel continuously.
> Concurrently, some connected XOs appear dead for 1 minute or so,
> and reappear after short time.
>
> Assume that at some point 5 XOs have either really left, or "seem
> dead" anyway
>
> At some point 2 of these XOs are reconnected at the same time to
> the mesh channel by someone in the office.
> The 2 XOs will "jump" to a different position, whereas the other 3
> will simply vanish
>
> The way I see it, there is very clear/narrow/specific bug in
> handling the cache by the avahi daemon,
> when new hosts + dead hosts coexist.
>
> I hope the tests have cleared the picture alot
>
> yani
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
More information about the Devel
mailing list