System update spec proposal

Wed Jun 27 11:31:32 EDT 2007

On Tue, 2007-06-26 at 13:55 -0400, Ivan Krstić wrote:
> Software updates on the One Laptop per Child's XO laptop
> ========================================================

First some stray comments:

> 1.4. Design note: rsync scalability
> -----------------------------------
> 
> rsync is a known CPU hog on the server side. It would be absolutely
> infeasible to support a very large number of users from a single rsync
> server. This is far less of a problem in our scenario for three reasons:

What about CPU hogging on the school server? That seems likely to be far
less beefy than the centralized server.

> The most up-to-date bundle for each activity in the set is accessed, and
> the first several kilobytes downloaded. Since bundles are simple ZIP
> files, the downloaded data will contain the ZIP file index which stores
> byte offsets for the constituent compressed files. The updater then
> locates the bundle manifest in each index and makes a HTTP request with
> the respective byte range to each bundle origin. At the end of this
> process, the updater has cheaply obtained a set of manifests of the
> files in all available activity updates.

Zip files have the file index at the end of the file.

Now for comments on the general approach:

First of all, there seems to be exceptional amounts of confusion as to
exactly how some form of atomic updating of the system will happen. Some
people talk about overlays, others about vserver, I myself has thrown in
the filesystem transaction idea. I must say this area seems very
uncertain, and I worry that this will result in the implementation of
none of these options...

But anyway, the exact way these updates are applied is quite orthogonal
to how you download the bits required for the update, or how to discover
new updates. So far I've been mainly working on this part in order to
avoid blocking on the confusion I mentioned above.

As to using rsync for the file transfers. This seems worse than the
trivial manifest + sha1-named files on http approach I've been working
on, especially with the optional usage of bsdiff I just commited. We
already know (have to know in fact, so we can strongly verify them) the
contents on both the laptop and the target image. To drop all this
knowledge and have rsync reconstruct it at runtime seems both a waste
and a possible performance problem (e.g. cpu and memory overload on the
school server, and rsync re-hashing files on the laptop using up
battery). 

You talk about the time it takes to implement another approach, but its
really quite simple, and I have most of it done already. The only hard
part is the atomic applying of the bits. Also, there seems to be
development needed for the rsync approach too, as there is e.g. no
support for xattrs in the current protocol.

I've got the code for discovering local instances of updgrades and
downloading it already working. I'll try to make it do an actual
(non-atomic, unsafe) upgrade of a XO this week.

I have a general question on how this vserver/overlay/whatever system is
supposed to handle system files that are not part of the system image,
but still exist in the root file system. For instance,
take /var/log/messages or /dev/log? Where are they stored? Are they
mixed in with the other system files? If so, then rolling back to an
older version will give you e.g. your old log files back. Also, that
could be complicating the usage of rsync. If you use --delete then it
would delete these files (as they are not on the server).

Also, your document contains a lot of comments about what will be in FRS
and not. Does this mean you're working on actually developing this
system for FRS?