Serious side effect of #6299 (silencing salut so gabble can connect)

Morgan Collett morgan.collett at collabora.co.uk
Fri Feb 15 05:52:45 EST 2008


Hi John

Your points are valid. If we weren't in RC phase this would have been a
non-issue to fix the activities.

It turns out the all the activities (except Pippy) were defensively
coded enough to not crash with this particular failure - despite there
being an open bug about this failure being triggered in Record (#4494).
My apologies for raising the alarm based on grep and not actual testing.

I'll try to explain how we got where we are now, though, for the benefit
of those who haven't followed the development of collaboration.

John Gilmore wrote:
> Is there any reason why utterly local activities, which the user has
> never shared, are hooking themselves up to a sharing API?

Blame it on Connect, the first ever shared activity, which was our
testbed when implementing collaboration. Everyone else simply copied the
code.

At the time, activities had to do much of the heavy lifting for setting
up Telepathy channels, bypassing the Presence Service which provided the
sharing and joining functionality but not the Tubes setup.

To make it possible for activity authors to use less cargo-culted
boilerplate, we improved Presence Service and the corresponding Sugar
interface code before the original feature freeze for Update.1, so that
it did the heavy lifting behind the curtain and just handed activities
the result.

However, the corresponding improvements in the activities - in Connect
and HelloMesh - landed either shortly before or after the feature
freeze, and weren't considered actual bug fixes. We didn't consider this
important enough to get the other activities reviewed and updated during
a period in which every change was reviewed and critiqued for intrusiveness.

So there was no compelling reason for people to update their code, until
we started considering a patch that changes the assumption that there is
(at least) one CM running at any given time - at a very late stage in
the release process.

> (Perhaps this is one of the reasons for slow activity startup?)

We have an open bug about importing Telepathy being slow: #5470. We are
part of the process to improve activity startup time: #5228.

Unfortunately we didn't have time to address this for Update.1, since we
hit scalability problems with both link local sharing (salut, avahi,
mesh saturation) and jabber server sharing (ejabberd falls over, shared
roster O(n^2) doesn't scale).

> I suggest that activities never think about, or talk to, or even
> import, the presence service or Tubes or Telepathy or anything --
> until the user actually hits "Share" on the activity.  That would fix
> this bug.  It's probably too intrusive a textual change for update.1,
> though.

Simply reducing the activity code to get the Telepathy channels from PS
instead of trying to set them up would fix this bug. (Or better
exception handling.) Doing what you suggest would improve performance
too but would be more intrusive. The appropriate place to optimise would
mostly be in Presence Service and Sugar.

> A less intrusive change might be for get_preferred_connection to
> make itself a nice no-op if the activity has never been shared.  And
> push its work into the next likely call (which would happen after
> the activity DOES get shared).  This would let the activities only
> crash after someone tries to share them, which would be much better
> than crashing every time you run 'em.

Unfortunately it returns D-Bus info to get a Telepathy connection, and
the code in question assumes it to be valid. We can't work around that
without touching Telepathy, which would be more intrusive than touching
the activity code.

> PS:  Does this stuff all work with a WiFi access point?  Have we considered
> turning off the mesh for update.2, thus avoiding melting down the mesh?

Yes, salut (link local sharing) works well on an infrastructure AP. However:

We have very different scenarios to cater for: Mongolia, where schools
are larger than we can currently handle on the jabber server, and the
mesh isn't handling link local either, and Peru with 6000 one room
schools with 10-30 kids and no schoolserver, Internet or any other
infrastructure.

Also the "two kids sitting under a tree somewhere" scenario must Just Work.

Morgan



More information about the Devel mailing list