<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">> 2. It takes up to 10min for avahi even to detect the inactivity of a peer.<br>

> i.e. If an XOs switches channels, for up to 10min avahi wont even know(it<br>> used to be 1-2min).<br><br></div>Is this with or without the patch from bug #6162 ? If without, then the time it<br>takes avahi to discover it should still be 2 mintues. I'd like to how you test<br>

this. Oh and please file a bug, so we can actually track these issues.<br><div class="Ih2E3d"></div></blockquote><div><br>The patch 6162, as well as the patch of 5501 are in included in the 689/690 that I am testing. So this indeed explains the 10minutes(Actually i just found out of this bug).<br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> 3. It will take a total of about 30min for the XO to vanish from the mesh<br>

> view(this is tooo long!)<br><br></div>Again, file a bug. Needed info here is if there is a time difference between<br>when avahi marks something as removed, when salut sends out the removed signal<br>and when it actually disappears from the mesh view.<br>

<div class="Ih2E3d"></div></blockquote><div><br>This is now filed as 6282, with all dbusmonitor/avahibrowse logs to compare. </div><div>This case is also an example of a avahi/mesh view inconsistency.<br>Icons disappear form the mesh view/ but remain for about 1h longer in the avahi cache<br>

But these details should continue in trac anyway.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> 4. Avahi/mesh view respond independently.<br>

> The situation used to be that when an entry dissappeared in avahi, it<br>> disappeared in mesh view, and the same when new peers arrive.<br>> This relation was very consistent.<br>> However, now we have the following cases:<br>

> a) an XO will vanish from the mesh view, but remain "indefinitely" in the<br>> avahi cache as "failed to resolve"<br>> b) sometimes avahi shows alot less peers than the mesh view. The extra peers<br>

> in the mesh view are definitely active since they properly respond to<br>> activity joining/sharing.<br>> c)sometimes avahi included more active peers than the mesh view.<br>> does anyone know why this is happening?<br>

> Is it a bug?<br>> I have logs, if needed, that compare avahi-browse with timestamped<br>> dbus-monitor logs, that indicate the inconsistencies.<br><br></div>Well you all list them as undesired behaviour, so i would say they're bugs.<br>

<div class="Ih2E3d"></div> </blockquote><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> 5. An important improvement is that peers will not generally fail alot on<br>

> their own.<br>> So, if many XOs join a mesh channel, and noone goes away, the will not start<br>> failing. This used to be a common effect after 4-5 XOs. However, i noticed<br>> once in 1cc, 61 active XOs in the mesh view!<br>

<br></div>When you say salut, you actually mean avahi. It would help if you could be<br>clear on what you mean :) This improvement is probably caused by the fix in<br>#5501.<br><div class="Ih2E3d"></div></blockquote><div>

<br>I mean avahi indeed. In the past these two were very tight to each other.<br>And i believe that the only direct way to examine salut is by checking the buddy list in the Analyze activity.<br>I remember Ricardo had an interesting case were the buddy list included plenty of XOs, which were also properly sharing in the mesh view, but the avahi list was empty. Does this seem possible? (unfortunately no log at the moment)<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>Anyway for all the bugs you should have filed instead of sending this mail, i<br>

will need tcpdump logs, avahi logs, salut logs and if possible meshview logs<br>indicating when contacts are removed from the mesh from a machine where you say<br>the behaviour. Preferably with timestamps</blockquote><div>

 </div><div>I updated the trac with logs/tcpdumps/dbusmon/screenshots...enjoy!<br><br>

The reason i send first this email before filing tons of bugs is

because i though it was necessary to describe the big picture, and the

current status of salut. And also to avoid duplicate bugs, or bugs that are in

fact intentional mods.<br><br>This conversation was unfortunately directed towards other issues(wireless difficulties is a sensitive subject at olpc!), but in fact its purpose was to determine some very specific bugs in salut, that have nothing to do "at the point" with scalability or robustness of the protocol.  When these are resolved, we can proceed with scalability, for which i am very confident.<br>

<br>I believe our current salut/avahi issues are described in the following points:<br><br></div></div>1. I was under the impression that when a peer switches channels it sends a "goodbye signal". And in fact only anorthodoxically removed peers(after crashes/poweroffs by pressing the button etc) would delay to disappear from mesh views.  The 10min TTL is not unreasonable, but it should only be used for a routine check. In fact peers that leave/arrive should inform the mesh instantly. In that case the 10min TLL will only affect only the mesh points with noisy links that their "goodbye" signals will get lost. And these connections are less priority anyway. Also we could send 2/3 "goodbye" signals to "ensure" delivery. <br>

<br>2. We should definitely decrease the timeout window between a lost peer being detected, and the actual disappearance from the mesh view. This used to be 10min, now it is 20min, but really, to my experience, if a peer is for more than 1-2min away he aint coming back.<br>

<br>3. Should we make the above TTL and timeout to be user specific, or custom anyway?. Will there be a problem if two XOs have different TTL? I would assume that it wont. The idea is that it is a waste of our resources to try to calculate the ideal values of TTL and timeout by asking the collabora team to fix, and fix again. Whereas we can make the test here in 1cc, and find ourselves which suits as best. Is it easy to implement such a patch?<br>

<br>4. The 5501 bug(xmas tree effect). This is a very specific bug in the protocol, and i believe it will be sorted soon.<br><br>5. Why are avahi/salut/mesh view not communicating well? I hope we will have some answers on that as well.<br>

<br>yianni