when an xo loses connection, how long does it take to disappear from other's neighbor view?

Tue Nov 6 05:55:44 EST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In reply to your previous mail, "iff" means "if and only if". It's often
used by mathematicians.

On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote:
> What does proper notification mean? Which are the cases that it happens?

If Salut is explicitly asked to disconnect, it will tell Avahi to "delete"
all its mDNS records (this actually consists of re-sending all the
records it was advertising, with the Time To Live set to 0 seconds).
This is sometimes referred to as a "goodbye" packet. See
http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt
section 11.2 "Goodbye Packets".

The only time we'll currently do this is when switching off Salut because
Gabble has connected successfully.

> Probably this is not if an XO moves slowly to a place with poor
> connectivity.

This is never done in response to network conditions - we can't know that
we've lost network connectivity until it's too late.

If the Time To Live on our mDNS records expires, that should have the same
effect; however, as Sjoerd explained, we currently ignore that, because
the 1CC mesh network is apparently unstable enough that the TTL
sometimes expires even for laptops that are actually present.

> In the case of a temporary(short) disruption of connectictivity, how much
> time does it generally take for it to return? You mentioned that in the past
> XOs were appearing  and disappearing constantly. This implies that the
> common drop of connectivity is in the scale of few seconds.

You tell me! :-) I don't have enough XOs to replicate the conditions of
a large mesh network like 1CC, so I can't comment on packet loss rates.
Perhaps Dan Williams (who used to maintain Presence Service) could help
you.

> If it is lost
> for more than a few minutes, than it is not bad for the XO to leave and
> return.  So I believe that 1h or even 10min are too long timeouts.

I believe we're currently using Avahi's default timeouts, which are
those recommended in the mDNS draft (linked above). If I'm right about
that, then we're using 120 second TTLs for the SRV and A records.

Assuming Salut and Avahi follow the draft's recommendations, this means
that for the records representing activities, buddies and laptops, if we
haven't seen an annoucement of a particular record, we will:

- - re-query after 96 - 98.4 seconds;
- - if no reply, re-query after 102 - 104.4 seconds;
- - if no reply, re-query after 114 - 116.4 seconds;
- - if no reply, assume the record has vanished after 120 seconds.

(In each of the ranges given for the re-queries, the exact time is
chosen at random, to avoid simultaneous queries from everyone in the
network.)

The timeout is reset as soon as we see any announcement of a record.

The only ones whose disappearance matters are the SRV and A records - if
a TXT record fails to disappear when it shouldn't, we don't really care.
TXT records have a substantially longer timeout (the draft recommends 75
minutes).

> There are a couple more things I would like to address:
> 
> 1. Is there a way to restart the presence service? In that way we can
> resolve a weird state. Will killing restarting the porcess work?

Only if client code that accesses the PS is amended to cope with this
(I just filed #4681 to represent this). Until #4681 is closed, if the PS
was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of
Sugar. Please see the bug for more details or to reply.

> 2. At what point in the source code, the presence serivce
> i.will try to connect to the jabber server?
> ii. run gabble?

I'll answer (ii.) first. Gabble is automatically run by the session bus
(dbus-daemon) via service activation, the first time the Presence Service
uses it, if it isn't already running. So there is no explicit code in the PS
to run Gabble.

OK, now (i.):

When Network Manager indicates that we have a valid IP address, we run
the _init_connection method of the ServerPlugin instance. If the Gabble
connection fails, we schedule a timer (currently 5 seconds) and retry
running _init_connection when the timer runs out. (classes
TelepathyPlugin and ServerPlugin, methods _init_connection,
_reconnect_cb, _could_connect, _handle_connection_status_change.)

What _init_connection does is: If there's already a Gabble connection and it's
connected, it'll be used. (class ServerPlugin, method
_find_existing_connection). Otherwise we make a new connection (method
_make_new_connection).

ServerPlugin (src/server_plugin.py) inherits from TelepathyPlugin
(src/telepathy_plugin.py) so some of the methods I mentioned are defined
in TelepathyPlugin, some in ServerPlugin, and some are defined in
TelepathyPlugin but overridden in ServerPlugin.

> ii. what type of communication is taking place between NM and PS

D-Bus messages, on the system bus.

> iv. the internet connectivity is detected by NM and sent to PS, or detected
> by PS

Internet connectivity isn't really detected, as such. The PS listens for
signals from Network Manager that tell it that the IP address has
changed. Whenever we have an IP address, we tell Gabble to connect to
the XMPP server; the nearest thing we have to "detecting Internet connectivity"
is that if we have it, Gabble will succeed.

In response to Gabble succeeding with a connection, PS calls the
Disconnect method on the Salut connection.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net

iD8DBQFHMEgwWSc8zVUw7HYRAoH2AKC71yprDPK/KPOyGAwez12odisbfQCgjMdY
1Fg4j1GS02m7HlnrhZBOe5Y=
=g6CY
-----END PGP SIGNATURE-----