XO in-field upgrades

Mon Jun 25 15:57:23 EDT 2007

On Mon, 2007-06-25 at 15:35 -0400, C. Scott Ananian wrote:
> On 6/25/07, Christopher Blizzard <blizzard at redhat.com> wrote:
> > That's going to be interesting, yeah.  You would need to teach the
> > wireless firmware about it?  How about just checking on wakeup?  Some
> > kind of wake-on-lan signal?
> 
> Binding upgrade notifications to a multicast address as I previously
> proposed fixes this problem without any kind of firmware hacking.

Ahh, sorry, I thought you meant _really_ asleep - like not waking up on
network events.  Although does our independent firmware know enough to
wake us up on multicast traffic?  I thought that it only worked on the
lower level protocols and that a packet had to be specifically destined
for our MAC address to get a wake up event.  I'll show my ignorance of
multicast here: does it include specific MAC addresses or is it a wide
broadcast at that layer?  I always assumed that it was the latter.

> 
> > Can you explain how they are odd?  It sure would help everyone.
> 
> Caveat: I'm not an expert here.  I haven't read the code, just the
> documentation.  So we can all follow along, start here:
>    http://linux-vserver.org/Paper#Unification
>    http://linux-vserver.org/Frequently_Asked_Questions#What_is_vhashify.3F
> 
> Basically, copy-on-write works by running a tool ('vhashify') which
> looks for identical files in the different containers and hard links
> them together, then marks them immutable.  The copy-on-write mechanism
> works by intercepting writes to immutable files and cloning the file
> before making it writable by the container.
> 
> Quoting from their FAQ:
> (when running vhashify:) "The guest needs to be running because
> vhashify tries  to figure out what files not to hashify by calling the
> package manager of the guest via vserver enter.
> 
> In order for the OS cache to benefit from the hardlinking, you'll have
> to restart the vservers."

Holy crap, this sounds like a steaming _pile_ of complexity.  Are we
seriously going to try to deploy on this?

> 
> Since vserver is doing file hashification anyway, it seems like it
> would be a much better idea to use this rather than reinvent the
> wheel.  Some other issues:
>   a) we may need to be aware of which files are hardlinked where in
> order to do proper updates.
>   b) not clear how/if we can make updates to an entire tree atomically
>   c) how/can we restart the vservers?  how important is this?
> 
> I think we need to bring in a vserver expert (ie, not me) to get these
> details right at the start.

Am happy to get more advice on this, for sure.  I suspect that all of
the vserver people we can call on are on this list.

Our current thinking basically is that we can do an update as part of an
update/shutdown procedure.  So you can apply the updates on the way
down, get a new env on restart.  That would handle the vserver restarts
and also "how to get a new kernel" issue that no one else has mentioned.

I'm not sure if hashing for updates and hashing for vserver are the
kinds of things we want to share or not.  I would love to hear more
about how vserver does its hashing and see if we can share.  I still
feel that keeping the update system as simple and uncomplex as possible
is a very good way to go - it lets us advance in parallel and come up
with something that works well.

It also sounds like it's pretty easy to do something like:

o Start root container
o Start guest container
o Apply update
o Start activities in that guest container

And the hardlink/CoW stuff will then give us an updated container.
Still doesn't help with updates on the base system nor kernel bits, but
it's a start.

> 
> > Yeah, but you always need both sets of information to be able to
> > generate them.  So you have to host full file + diff data if you want to
> > host an update.
> 
> My proposal would be that XOs pass around binary diffs *only* among
> themselves, and that if someone needs an older version or to leapfrog
> versions, they ask the school server.  This allows XOs to cheaply host
> updates in the common case.

You could do that with Alex's system as well.  But in Alex's case the XO
doesn't have to carry both the system it's using + diff.  Because the
system you're using is the update.

> 
> > The nice thing about Alex's system is that you only
> > have to host the file data that you're using on your system instead of
> > file + diff data.  You end up using less space that way.
> 
> If you look at the numbers I just posted, file+diff is 1.3% larger
> than just files.
> 
> >  If you want to
> > downgrade, you have to get the files or use the vserver versions (maybe
> > you could just use the old files handled by the CoW stuff to drive the
> > update system - that might work pretty well!)
> 
> Now we're talking! ;-)
> 
> > Keep in mind that those "blobs" he's talking about are just files.  The
> > only place where we would add binary diffs would be for individual
> > files, not entire trees.  So what we're downloading today is only the
> > changed files, largely for the sake of expediency and what I describe
> > above for the space savings.
> 
> I have some issues with the fact that the manifest includes the entire
> tree as well.  Upgrades should only include entries *changed* in the
> manifest.

I'm open to either option on that, for sure.  I want Alex to chime in
here as well.

> 
> >   But if I have to choose
> > between having something that works today with full files and saves some
> > space[...]
> 
> We're only talking about a week's worth of work here so far.  I agree
> that our schedule is aggressive, so let's get this right at design
> time, rather than trying to fix things after implementation.

We're talking about two week's worth of work in a schedule that only has
a few weeks. :)  And given that you're talking about using a system that
doesn't even work yet I'm happy to keep investing in something that
works very well, doesn't contain direct dependencies (yet!) and can work
out of the box with some level of predictability.

--Chris