Ivan's XO Field Upgrade Proposal
dcbw at redhat.com
Tue Jun 26 12:13:59 EDT 2007
On Tue, 2007-06-26 at 11:48 -0400, C. Scott Ananian wrote:
> Ivan dropped by 1cc tonight, and I was able to squeeze the details of
> *his* field upgrade proposal out of him. As I haven't yet seen him
> email this to the list, I'll try to state it for him. Hopefully he
> can then give a diff against my version, which will save him time.
> The XO already needs to "call home" to <foobar>.laptop.org as part of
> the antitheft system. In addition to the bits which tell it "you're
> not stolen", the response also includes the version number of the
> latest system software. I assume the version number is a simple
> If the laptop's version is not up to date, it tries to get the bits
> from the school server. It it sees the school server, but the school
> server doesn't have the bits (yet), it backs off a retries later. If
> it doesn't have a school server, or the retry on the school server
> fails, it gets the bits directly from Cambridge.
> The "get the bits" phase is as simple as practical: rsync. The school
> server maintains a complete image of the XO filesystem, possibly in a
> small number of versions. The XO just rsyncs with the school server
> to get the updated files. This magically does the proper
> binary-differencing thing, and is robust against connection failure,
> data corruption, etc. If it can't get the bits from the school
> server, it just rsyncs directly against a <foo>.laptop.org machine in
> We use vserver copy-on-write to do the atomic upgrade. There is a
> 'fake root' context (which i'll call /fakeroot here) which has all the
> files in the filesystem. Activity containers & etc are created out of
> /fakeroot. The upgrade process starts out with a copy-on-write clone
> of /fakeroot, which it rsyncs to get the new filesystem. We then
> a) save this new tree as /upgraded-root (or some such) and on reboot
> swap /fakeroot and /upgraded-root, or...
> b) do some sort of pivot_root to swap these trees without rebooting.
> This latter approach has more technical risk, but is still A Simple
> Matter Of Software and permits live upgrades.
> Some notes:
> a) rsync scales, as demonstrated by rsync.kernel.org. We can use
> load-sharing, anycast addresses, etc if necessary (if it turns out
> that very many laptops are not getting updates from a school server).
> The important thing is that this complexity is on our side and is not
> propagated to the XO software.
> b) This completely punts on XO-to-XO upgrades. This complexity is
> not necessary for version 1.0, and (given the efficient rsync
> protocol) doesn't buy you all that much. It can be added later,
> either via a different mechanism or by rsync between machines.
> c) This proposal has no way to push upgrades. Again, this can be
> added later (eg, a signed broadcast packet which says, "upgrade now to
> version N". The actual upgrade is then identical.)
> d) The filesystem can (should) contain a manifest, as described in
> Alex's proposal, which is signed and can be used to
> authenticate/validate the upgrade. The manifest is rsynced along with
> the rest of the files, and then checked. We also use rsync-over-ssh
> with fixed keys to ensure that we're only rsyncing with 'real' update
> Scott's comments (Ivan's not heard all of these, he might not agree):
> a) I enthusiastically recommend this approach. It seems to be the
> simplest thing with reasonable performance that will work. It avoids
> reinventing the wheel, and it seems to have very few dependencies
> which might break it. Improvements can be made to the rsync protocol
> if better efficiency is desired, and that work will help not only OLPC
> but also the (myriad) users of rsync.
> b) For simplicity, I favor (re)using rsync in other places where we
> need synchronization and/or file distribution. For example, I think
> that the school servers use rsync in order to get their copies of the
> XO filesystem.
> c) No extra protocols or dependencies. rsync should be statically
> linked. "School server doesn't have version N" should be read as
> "rsync to school server fails", rather than involving some extra
> protocol or query. I'd like to see the driver program written in a
> compiled language and statically linked as well, to provide robustness
> in case an upgrade breaks python (say).
> e) The rsync protocol is interactive. There are more round-trips than
> in other proposals, but the process is robust: if it fails, it can
> just be restarted and it will magically continue where it left off.
> f) We can "do better" than rsync, because we know what files are on
> the other side, and can use this to send better diffs. This
> improvement could be added to rsync directly, rather than creating
> special XO-only code. (Option to preseed rsync with a directory of
> files known to be on the remote machine.)
> g) I believe that we can use "plain old" hard links when we do the
> rsync, instead of requiring any fancy vserver stuff. rsync will break
> the link appropriately when it needs to modify a file (as long as the
> --inplace option isn't given). This probably breaks a critical edge
> during development.
Downside of this is, as Alex pointed out, it'll load the mesh a _lot_
more than XO->XO updates.
More information about the Devel