Ivan's XO Field Upgrade Proposal

Dan Williams dcbw at redhat.com
Tue Jun 26 12:13:59 EDT 2007

On Tue, 2007-06-26 at 11:48 -0400, C. Scott Ananian wrote:
> Ivan dropped by 1cc tonight, and I was able to squeeze the details of
> *his* field upgrade proposal out of him.  As I haven't yet seen him
> email this to the list, I'll try to state it for him.  Hopefully he
> can then give a diff against my version, which will save him time.
> The XO already needs to "call home" to <foobar>.laptop.org as part of
> the antitheft system.  In addition to the bits which tell it "you're
> not stolen", the response also includes the version number of the
> latest system software.  I assume the version number is a simple
> integer.
> If the laptop's version is not up to date, it tries to get the bits
> from the school server.  It it sees the school server, but the school
> server doesn't have the bits (yet), it backs off a retries later.  If
> it doesn't have a school server, or the retry on the school server
> fails, it gets the bits directly from Cambridge.
> The "get the bits" phase is as simple as practical: rsync.  The school
> server maintains a complete image of the XO filesystem, possibly in a
> small number of versions.  The XO just rsyncs with the school server
> to get the updated files.  This magically does the proper
> binary-differencing thing, and is robust against connection failure,
> data corruption, etc.  If it can't get the bits from the school
> server, it just rsyncs directly against a <foo>.laptop.org machine in
> Cambridge.
> We use vserver copy-on-write to do the atomic upgrade.  There is a
> 'fake root' context (which i'll call /fakeroot here) which has all the
> files in the filesystem.  Activity containers & etc are created out of
> /fakeroot.  The upgrade process starts out with a copy-on-write clone
> of /fakeroot, which it rsyncs to get the new filesystem.  We then
> either:
>   a) save this new tree as /upgraded-root (or some such) and on reboot
> swap /fakeroot and /upgraded-root, or...
>   b) do some sort of pivot_root to swap these trees without rebooting.
>  This latter approach has more technical risk, but is still A Simple
> Matter Of Software and permits live upgrades.
> Some notes:
>  a) rsync scales, as demonstrated by rsync.kernel.org.  We can use
> load-sharing, anycast addresses, etc if necessary (if it turns out
> that very many laptops are not getting updates from a school server).
> The important thing is that this complexity is on our side and is not
> propagated to the XO software.
>  b) This completely punts on XO-to-XO upgrades.  This complexity is
> not necessary for version 1.0, and (given the efficient rsync
> protocol) doesn't buy you all that much.  It can be added later,
> either via a different mechanism or by rsync between machines.
>  c) This proposal has no way to push upgrades.  Again, this can be
> added later (eg, a signed broadcast packet which says, "upgrade now to
> version N".  The actual upgrade is then identical.)
>  d) The filesystem can (should) contain a manifest, as described in
> Alex's proposal, which is signed and can be used to
> authenticate/validate the upgrade.  The manifest is rsynced along with
> the rest of the files, and then checked.  We also use rsync-over-ssh
> with fixed keys to ensure that we're only rsyncing with 'real' update
> servers.
> Scott's comments (Ivan's not heard all of these, he might not agree):
>  a) I enthusiastically recommend this approach.  It seems to be the
> simplest thing with reasonable performance that will work.  It avoids
> reinventing the wheel, and it seems to have very few dependencies
> which might break it.  Improvements can be made to the rsync protocol
> if better efficiency is desired, and that work will help not only OLPC
> but also the (myriad) users of rsync.
>  b) For simplicity, I favor (re)using rsync in other places where we
> need synchronization and/or file distribution.  For example, I think
> that the school servers use rsync in order to get their copies of the
> XO filesystem.
>  c) No extra protocols or dependencies.  rsync should be statically
> linked.  "School server doesn't have version N" should be read as
> "rsync to school server fails", rather than involving some extra
> protocol or query.  I'd like to see the driver program written in a
> compiled language and statically linked as well, to provide robustness
> in case an upgrade breaks python (say).
>  e) The rsync protocol is interactive. There are more round-trips than
> in other proposals, but the process is robust: if it fails, it can
> just be restarted and it will magically continue where it left off.
>  f) We can "do better" than rsync, because we know what files are on
> the other side, and can use this to send better diffs.  This
> improvement could be added to rsync directly, rather than creating
> special XO-only code.  (Option to preseed rsync with a directory of
> files known to be on the remote machine.)
>  g) I believe that we can use "plain old" hard links when we do the
> rsync, instead of requiring any fancy vserver stuff.  rsync will break
> the link appropriately when it needs to modify a file (as long as the
> --inplace option isn't given).  This probably breaks a critical edge
> during development.

Downside of this is, as Alex pointed out, it'll load the mesh a _lot_
more than XO->XO updates.


More information about the Devel mailing list