The reason we see icons flashing here and there in the mesh view.. i.e. "xmas tree effect"

Giannis Galanis galanis at laptop.org
Thu Dec 13 23:18:01 EST 2007


I had several tests related to the xmas tree effect we see in the mesh view.


The effect is that some times XOs disappear + reappear to the same or
different position, or simply disappear. More usually it happens for many
XOs simultaneously.

The results i have, clearly indicate that this is an issue an the Avahi
daemon, which is used by the Salut telepathy service. The sugar interface
displayes the information it receives from salut very reliably. This means
that when a host dissapear from the avahi's host list, it vanished instantly
from the mesh view, and the same when a new host arrives.

The Avahi deamon runs below Salut and keeps receives information from other
hosts in the network which also run Avahi deamon.
It keeps a local cache with the recent hosts.
At regular intervals(of 1-2 mins i think), it checks whether the hosts in
the cache are alive. If not, they are recorded as "failed"
The above check can be invoked by "avahi-browse -t -r _presence._tcp"
continuously(instead of waiting for 1-2mins)
After a certain timeout, a failed entry(dead host) will disappear from the
cache, and instantly it will disappear from the mesh view.

This timeouts is pretty long(several minutes), so a host(XO) has the chance
to become alive again with no effect on the mesh view.
This can occur when:
a. the XO's avahi packets dont get through due to high mesh traffic. In this
case the other XOs might either see is as alive, or dead according to the
conditions.
b.the XO's deliberately moved to another channel, or anyway disconnected. In
that case, all othes XOs will see it as dead
>From a client's point of view, the two cases are treated almost the same.

THE TEST:
6 XOs connected to channel 11, with forwarding tables blinded only to them
selves, so no other element in the mesh can interfere.

The cache list was scanned continuously on all XOs using a script

If  all XOs remained idle, they all showed reliably to each other mesh view.
Every 5-10 mins an XO showed as dead in some other XOs scns, but this was
shortly recovered, and there was no visual effect in the mesh view.

If you switched an XO manually to another channel, again it showed "dead" in
all others. If you reconnected to channel 11, there is again no effect in
the mesh view.
If you never reconnected, in about 10-15 minutes the entry is deleted, and
the corresponding XO icon dissapeared from the view.

Therefore, it is common and expected for XOs to show as "dead" in the Avahi
cache for some time for some time.

THE BUG:
IF a new XO appears(a message is received through Avahi),
WHILE there are 1 or more XOs in the cache that are reported as "dead"
THEN Avahi "crashes" temporarily and the cache CLEARS.

At this point ALL XOs that are listed as dead instantly disappear from the
mesh view.
But, of course, some of the "dead" XOs are expected to re-appear shortly.
Specially those that are still in the same mesh channel, but merely failed
to transmit its avahi packets due to traffic load.

Note that if there is only 1 XO that looks dead, but returns, everything is
normal.
But, if there are 2,3.. XOs that look dead, when 1 returns, then:
a. all(the dead ones) disappear from the view
b. the 1 that returned will reappear right after in probably a different
position. i.e. it will "jump"

The avahi-browse command scans realtime the network(i.e. sends requests for
all hosts in its cache list) and runs for a several seconds. If the above
situation occurs, it freezes(this is what i meant by "crashes"). When it is
restarted the cache is cleared from previously dead hosts.

A typical situation that the "xmas tree effect" occurs:
20 XOs are running salut in channel 1. This incuded XOs conencted to
medialab AP, schoolserver, linklocal.
XOs leave the channel continuously.
Concurrently, some connected XOs appear dead for 1 minute or so, and
reappear after short time.

Assume that at some point 5 XOs have either really left, or "seem dead"
anyway

At some point 2 of these XOs are reconnected at the same time to the mesh
channel by someone in the office.
The 2 XOs will "jump" to a different position, whereas the other 3 will
simply vanish

The way I see it, there is very clear/narrow/specific bug in handling the
cache by the avahi daemon,
when new hosts + dead hosts coexist.

I hope the tests have cleared the picture alot

yani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20071213/c55dfe67/attachment.html>


More information about the Devel mailing list