A different proposal for XO upgrade.

Tue Jun 26 06:26:42 EDT 2007

On Mon, 2007-06-25 at 17:51 -0400, C. Scott Ananian wrote:
> As food for discussion, here's a counter proposal for an XO upgrade
> mechanism, focusing on the network side.
> ---------
> Design goals:
>  - minimize round-trips necessary for successful upgrade
>  - minimize size of upgrades

(I'm using this reply as a general reply for the whole update thread
instead of replying to each mail.)

First, I must say that the numbers you posted for psdiff look pretty
nice for smaller updates like security problems. (I'm not sure they will
help as much for larger updates though.) It sounds interesting to try to
use this if possible.

The original discussion about diffs and updates talked about storing the
original "gold" image on all laptops, and then sending each update as a
diff from this image. This would mean that the diffs would keep growing
in size, being larger to send and we'd need to use more and more space
for the original image data not used anymore. However, your proposal
instead seems to be that all diffs are incremental (and availible as
reverse diffs for incremental reversal). This keeps each update small,
but instead you need to get all the updates and apply them in order.

There is a general difference in how data is distributed in these two
proposals. In your proposal each machine polls and downloads from a
central server. This relies on the mesh (and ultimately a connection to
the internet) for downloading the data. This makes it very sensitive to
the performance of the mesh network. To make this better it uses
multicast to avoid the same data being sent over the mesh to many times.

My distribution model instead is more distributed. The idea is that
while full mesh network traffic is spotty and slow the local wlan
connections have no reason to be much worse than any other wlan
connections (including ours). So, to download a full file from a
neighbour isn't all that slow. Nor does it affect the general
performance of the mesh. 

Instead of all machines downloading the update from the server via the
mesh (or multicast) upgrades happen in stages. The first stage is the
school server getting the new version and announching it. Then all
machines that are close to the server will find the upgrade and update.
Then machines close to these laptops will upgrade from the laptops and
then the update will propagate outwards one machine at a time. (Things
are also dynamic, as the machines move around.) The spread is
"exponential" as the number of machines with the update availible
increases, and at no time are we sending data over the mesh.

This distributed system makes it possible for laptops to get upgraded
that doesn't have a connection that reaches the school server, as long
as another updated laptop happens to at some time be in the
neighborhood. And it also is possible to update such a laptop to any
version, without having to store all the diffs between the two versions
in the update (i.e. if the laptop hasn't had connection to the net in a
while and don't have a recent update).

I must say that the multicasting portion of your proposal sounds risky
to me. It seems like it is pretty easy for there to be multiple Upgrade
Leaders for the same version, both over time and in parallel, (for
instance due to missed packages, laptops turned on at different times,
etc). Each of these sending multicast messages on the mesh as they
upgrade sound like it could easily fill the mesh with a lot of multicast
traffic.

When you say "multicast", how far do you mean these packages would be
sent? I guess on a mesh network multicast packages are limted by some
kind of hop limit, to avoid loops if nothing else? Some more info on how
multicast works on the mesh would be nice.

Also, i keep seeing references to vserver, but i still hasn't seen any
way this could be used to update the system image. In the case of your
proposal this doesn't change much though. For instance, if we used jffs2
filesystem transactions instead the only difference is that applying the
diffs as they come might not be a good idea, instead we keep them and
apply them all in a transaction when we're ready.

So, to sum it up, my approach certainly uses more bits on the network
for data transfer, but I don't think that is a large issue, since all
(most?) transfers are local, and thus should get good performance. It is
also less dependent on direct connections to the internet (or the school
server) and doesn't cause mesh network traffic affecting bandwidth for
other people. Its also, imho vastly simpler to implement, test, and
deploy.

However, it certainly would be nice if we could use the binary diff
approach in a distributed system too. A simple extension for this would
be to do diffs on the school server. We could easily store diffs between
"consecutive" blobs and manifests (say as
http://server/blobs/{first-sha1}-{second-sha1}), and the client could
detect that the server has both the version it already has and the
target version and request a diff between the two files instead of a
full blob (falling back to the full file if the diff wasn't availible). 

Its a bit harder to distribute this to the laptops though, as we don't
want to store older data on them they can't generate diffs. I guess we
could store the diffs from the previous version to the current version,
hoping that this would not use a lot of storage. I guess that is a
question of tradeoff, local bandwidth vs laptop storage.

Maybe there is a better way to use diffs. Lemme think on it for a bit.