Salut and Suspend/Resume issues

Giannis Galanis galanis at laptop.org
Tue Feb 19 15:18:40 EST 2008


On Feb 19, 2008 2:48 PM, Ricardo Carrano <carrano at ricardocarrano.com> wrote:

> Yanni,
>
> Timeout is a value, not a range. The effects brought by the timeout may
> manifest in a period (a range).


Did a use it otherwise? Because of the effects of xmas tree, the timeout for
a failed XO until it's icon is removed is 10-30min.


> I believe everyone will agree that 30 minutes is a long time to wait (and
> like Polychronis added) defeat the whole idea of a presence service.
>
> But, what I want to stress is that we are dealing with different issues
> here.
>

> I don't believe this 30 minutes or the xmas tree effect is related to
> suspend/resume. Those seem like bugs somewhere in the stack of software that
> support presence, while the suspend/resume issues are clearly a side effect
> of  the multicast traffic not being "heard" by a suspended XO.
>

There are way too many issues. Theses bugs (30min/xmas tree) enhance the
effects of suspend/resume on the mesh.
I believe that since we have the big test week coming, everyone must be
aware of them, or else noone will interpret the results properly.

The direct suspend/resume bugs are:

1. Why the mesh view  empties after a long suspend, and how this affects the
mesh view
2. Why some times the avahi cache is cleared after  resume.


Ricardo, do you have anwers to the questions I posted before? :
1. When a XO resumes, does it send any notification via avahi, that it is
back? Because if it doesnt, then other XOs that have cleared it from their
lists, they will never search for it.

2. Every scans the network every 10min, to check whether its avahi peers are
alive, in multicast packets. Do these packets include the address of the
peers/targets? I think they do, unless i am very confused. Couldn't we
awake/resume the target XO when it receives these specific packets?


On Feb 19, 2008 3:00 PM, Giannis Galanis <galanis at laptop.org> wrote:

> The list expires in 10min-30min.
>
> But we cant wait 30min before suspending, it is way too long.
>
>
> On Feb 19, 2008 11:37 AM, Ricardo Carrano <carrano at ricardocarrano.com>
> wrote:
>
> > Yanni,
> >
> > As I posted in the bug, I believe that you are observing the entries on
> > the avahi cache expiring.
> >
> > So, your first scenario would happen when the suspend time is longer
> > than the time it takes for all entries to expire.
> > The second scenario would happen when the suspend time is not long
> > enough to make all cached entries to go away.
>
>
> Oh i see that you mean. But, i think both cases are when the suspend time
> is longer than time to expire.
> The first is UI effect, and might have no relation to salut, but to mesh
> view in general
> The second is an avahi effect, that the avahi cache is chagned
> Both, are in long suspends
>
> >
> > And the third scenario seems related to previous reports you've made on
> > the Xmas tree effect, so not related to suspend/resume.
>
>
> The xmas tree effect appears when XOs leave connection, while others
> return.
> Suspend/resume enhances this effect dramatically, because in 1-2min
> everyone goes away, and they return at random time according to when they
> resume.
>
> In my suspend-salut tests , the xmas tree effect(although NOT related to
> suspend/resume), it affects salut alot more then the other 2 scenarios
>
> My point is that we must fix it anyway. But especially now!!
>
>
> >
> > What do you think?
>
>
> I have 2 questions that will help (me) understand alot about the
> situation:
>
> 1. When a XO resumes, does it send any notification via avahi, that it is
> back? Because if it doesnt, then other XOs that have cleared it from their
> lists, they will never search for it.
>
> 2. Every scans the network every 10min, to check whether its avahi peers
> are alive, in multicast packets. Do these packets include the address of the
> peers/targets? I think they do, unless i am very confused. Couldn't we
> awake/resume the target XO when it receives these specific packets?
>
> we need to do some sniffing
>
>
>
> >
> > On Feb 19, 2008 1:13 PM, Giannis Galanis <galanis at laptop.org> wrote:
> >
> > >
> > >
> > > On Feb 19, 2008 10:13 AM, Ricardo Carrano <carrano at ricardocarrano.com>
> > > wrote:
> > >
> > > >
> > > > I was asking whether it would help to have the wireless module wake
> > > > > us
> > > > > on multicast packets instead of only unicast.  Are you saying that
> > > > > it
> > > > > would?
> > > >
> > > >
> > > > It seems so, though it would, as John points out, make resumes far
> > > > more constant. It seems we have to find a creative way out of this tough
> > > > choice (automated suspend vs mesh) or face it.
> > > >
> > > >
> > > > >
> > > > >
> > > > >   > Avahi entries will expire after some time. Suspend will
> > > > > prevent it
> > > > >   > to update its cache.
> > > > >
> > > > > Yani's bug report (#6467) suggests that Avahi entries often expire
> > > > > immediately upon resume:
> > > > >
> > > > >   After the XO resumes (probably after beinng suspended for
> > > > > several
> > > > >   minutes) all the icons in the mesh view vanish, except the mesh
> > > > >   circles.
> > > >
> > > >
> > > > I read this as the avahi-cache  expiring its entries.  Yanni  can
> > > > you put timeframes on this?
> > > > Could check how long does it take to expiry an entry (TO) and then
> > > > check if:
> > > > Suspend time > TO -> all entries vanish
> > > > Suspend time << TO -> no entries vanish
> > > > Supens time ~ TO -> some entries vanish
> > > >
> > >
> > > There as 2 cases where icons vanish due to suspend.
> > >
> > > 1st: The moment you resume(it generally happens after long suspends),
> > > all icons vanish instantly(APs/XOs). This bug (#6467) suggests that sugar
> > > has a problem with suspend resume.
> > > The icons slowly reappear. I assume that if the avahi peer list is
> > > intact that all XOs return.
> > >
> > > 2nd: The avahi list smtimes looses some or all of the peers at resume.
> > > This is also under 6467, but it seems technicaly different. One possible
> > > explanation could be that during suspend th XO resumes several times, but i
> > > didnt notice it! And within this time frames it realized that the other
> > > suspended XOs are gone, so it cleared its cache. Now when I resumed it
> > > myself, I observed that the cache is clean!!
> > >
> > > Now, regarding the timeouts of avahi. This is a 3rd thing:
> > > When an XO leaves the channel we have 4 states:
> > >    mm:ss
> > > 1. 00:00  XO leave the channel(manually/or ti suspended)
> > > 2. 10:00  Avahi notices teh XO left, and reports it as "failed"
> > > 3. 30:00  Icon dissappears in the mesh view
> > > 4. 60:00  Avahi cache is cleared
> > > Additionally there is a bug(#5501) according to which, is a NEW XO
> > > arrives between states 2 and 3, then instantly ALL "failed" avahi peers are
> > > cleared and the corresponding icons vanish.
> > >
> > > So, the 3rd case is the following:
> > >
> > > Assume a mesh has e.g. 20 XOs, and I use my XO so it doesnt suspend,
> > > but the rest 19 of them are suspended.
> > > If in >10mins a new XO arrives, then all the 19 XOs instantly vanish
> > > from the mesh.
> > >
> > > So the TO time is between 10->30min... but closer to 10min if many XOs
> > > suspend/resume
> > > So if resume time << 10min everything is fine!!
> > >
> > >
> > >
> > > What i dont know is when an XO resumes if it sends any avahi packet no
> > > notify tis presence/return. Because if it doesnt, then the XO wont exist int
> > > he others cache list, so the others wont search for it.
> > > Sjoerd, can you answer this?
> > >
> > > This would explain why after resume some XOs take tooo long to see
> > > each other again.
> > > If you combine this with the "2nd" case, you will see that in the
> > > natural case that XOs will resume at random points in time by the user, they
> > > will all clear their cache, unless they resume concurrently.
> > > So in the end, all will have empty caches!!
> > >
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > - Chris.
> > > > > --
> > > > > Chris Ball   <cjb at laptop.org>
> > > > >
> > > >
> > > >
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080219/1e8134dd/attachment.html>


More information about the Devel mailing list