Ivan's XO Field Upgrade Proposal

C. Scott Ananian cscott at laptop.org
Tue Jun 26 11:48:51 EDT 2007


Ivan dropped by 1cc tonight, and I was able to squeeze the details of
*his* field upgrade proposal out of him.  As I haven't yet seen him
email this to the list, I'll try to state it for him.  Hopefully he
can then give a diff against my version, which will save him time.

The XO already needs to "call home" to <foobar>.laptop.org as part of
the antitheft system.  In addition to the bits which tell it "you're
not stolen", the response also includes the version number of the
latest system software.  I assume the version number is a simple
integer.

If the laptop's version is not up to date, it tries to get the bits
from the school server.  It it sees the school server, but the school
server doesn't have the bits (yet), it backs off a retries later.  If
it doesn't have a school server, or the retry on the school server
fails, it gets the bits directly from Cambridge.

The "get the bits" phase is as simple as practical: rsync.  The school
server maintains a complete image of the XO filesystem, possibly in a
small number of versions.  The XO just rsyncs with the school server
to get the updated files.  This magically does the proper
binary-differencing thing, and is robust against connection failure,
data corruption, etc.  If it can't get the bits from the school
server, it just rsyncs directly against a <foo>.laptop.org machine in
Cambridge.

We use vserver copy-on-write to do the atomic upgrade.  There is a
'fake root' context (which i'll call /fakeroot here) which has all the
files in the filesystem.  Activity containers & etc are created out of
/fakeroot.  The upgrade process starts out with a copy-on-write clone
of /fakeroot, which it rsyncs to get the new filesystem.  We then
either:
  a) save this new tree as /upgraded-root (or some such) and on reboot
swap /fakeroot and /upgraded-root, or...
  b) do some sort of pivot_root to swap these trees without rebooting.
 This latter approach has more technical risk, but is still A Simple
Matter Of Software and permits live upgrades.

Some notes:
 a) rsync scales, as demonstrated by rsync.kernel.org.  We can use
load-sharing, anycast addresses, etc if necessary (if it turns out
that very many laptops are not getting updates from a school server).
The important thing is that this complexity is on our side and is not
propagated to the XO software.
 b) This completely punts on XO-to-XO upgrades.  This complexity is
not necessary for version 1.0, and (given the efficient rsync
protocol) doesn't buy you all that much.  It can be added later,
either via a different mechanism or by rsync between machines.
 c) This proposal has no way to push upgrades.  Again, this can be
added later (eg, a signed broadcast packet which says, "upgrade now to
version N".  The actual upgrade is then identical.)
 d) The filesystem can (should) contain a manifest, as described in
Alex's proposal, which is signed and can be used to
authenticate/validate the upgrade.  The manifest is rsynced along with
the rest of the files, and then checked.  We also use rsync-over-ssh
with fixed keys to ensure that we're only rsyncing with 'real' update
servers.

Scott's comments (Ivan's not heard all of these, he might not agree):
 a) I enthusiastically recommend this approach.  It seems to be the
simplest thing with reasonable performance that will work.  It avoids
reinventing the wheel, and it seems to have very few dependencies
which might break it.  Improvements can be made to the rsync protocol
if better efficiency is desired, and that work will help not only OLPC
but also the (myriad) users of rsync.
 b) For simplicity, I favor (re)using rsync in other places where we
need synchronization and/or file distribution.  For example, I think
that the school servers use rsync in order to get their copies of the
XO filesystem.
 c) No extra protocols or dependencies.  rsync should be statically
linked.  "School server doesn't have version N" should be read as
"rsync to school server fails", rather than involving some extra
protocol or query.  I'd like to see the driver program written in a
compiled language and statically linked as well, to provide robustness
in case an upgrade breaks python (say).
 e) The rsync protocol is interactive. There are more round-trips than
in other proposals, but the process is robust: if it fails, it can
just be restarted and it will magically continue where it left off.
 f) We can "do better" than rsync, because we know what files are on
the other side, and can use this to send better diffs.  This
improvement could be added to rsync directly, rather than creating
special XO-only code.  (Option to preseed rsync with a directory of
files known to be on the remote machine.)
 g) I believe that we can use "plain old" hard links when we do the
rsync, instead of requiring any fancy vserver stuff.  rsync will break
the link appropriately when it needs to modify a file (as long as the
--inplace option isn't given).  This probably breaks a critical edge
during development.

OK, that's all folks.  Discuss!
  --scott

-- 
                         ( http://cscott.net/ )



More information about the Devel mailing list