System update spec proposal
Xavier Alvarez
xavi.alvarez at gmail.com
Tue Jun 26 16:23:45 EDT 2007
The wiki-version of this document can be found in
http://wiki.laptop.org/go/XO_updater
enjoy
/X
On Tuesday 26 June 2007 14:55, Ivan Krstić wrote:
IK> Software updates on the One Laptop per Child's XO laptop
IK> ========================================================
IK>
IK>
IK>
IK>
IK> 0. Problem statement and scope
IK> ==============================
IK>
IK> This document aims to specify the mechanism for updating
software on the IK> XO-1 laptop. When we talk about updating
software, we are referring both IK> to system software such as
the OS and the core services controlled by IK> OLPC that are
required for the laptop's basic operation, and about any IK>
installed user-facing applications ("activities"), both those
provided IK> by OLPC and those provided by third parties.
IK>
IK>
IK>
IK>
IK> 1. System updater
IK> =================
IK>
IK> 1.1. Core goals
IK> ---------------
IK>
IK> The three core goals of a software update tool (hereafter
"updater") IK> for the
IK> XO are as follows:
IK>
IK> * Security
IK> Given the initial age group of our users, it is the only
reasonable IK> solution to default to automatic detection
and installation of IK> updates, both to be able to apply
security patches in a timely IK> fashion, and to enable
users to benefit from rapid development and IK>
improvements in the software they're using. Automatic updates,
IK> however, are a security issue unto themselves:
compromising the IK> update system in any way can provide
an attacker with the IK> ability to
IK> wreak havoc across entire installed bases of laptops
while IK> bypassing
IK> -- by design -- all the security measures on the
machine. IK> Therefore,
IK> the security of the updater is paramount and must be its
first IK> design goal.
IK>
IK> * Uncompromising emphasis on fault-tolerance
IK> Given the scale of our deployment, the relatively high
IK> complexity of
IK> our network stack when compared to currently-common
deployments, IK> the
IK> unreliability of Internet connectivity even when
available, and IK> perhaps most importantly our desire for
participating countries to IK> soon begin customizing the
official OLPC OS images to best suit IK> them, it is clear
that our updater must be fault-tolerant. This is IK> both
in the simple sense -- cryptographic checksums need to be used
IK> to ensure updates were received correctly -- and in the
more IK> complex
IK> sense that the likelihood of a human error with regard
to update IK> preparation goes up proportionally to the
number of different base IK> OS images at play. A
fault-tolerant updater will therefore allow IK>
_unconditional_ rollback of the most recently applied IK>
update. "Unconditional" here means that, barring the failure of
IK> other parts of the system which are dependencies of the
updater IK> (e.g. the filesystem), the updater must always
know how to IK> correctly
IK> unapply an applied update, even if the update was
malformed. IK>
IK> * Low bandwidth
IK> For much the same reasons (project scale, Internet
access scarcity IK> and unreliability) that require
fault-tolerance from the updater, IK> the tool must take
maximum care to minimize data transfer IK> requirements.
This means, concretely, that a delta-based approach IK>
must be utilized by the updater, with a "keyframe" or "heavy"
IK> update
IK> being strictly a fallback in the unlikely case an update
path IK> cannot
IK> be constructed from the available or reachable delta
sets. IK>
IK>
IK>
IK> 1.2. Design
IK> -----------
IK>
IK> It is given, due to requirements imposed by the Bitfrost
security IK> platform, that a laptop will attempt to make daily
contact with the IK> OLPC anti-theft servers. During that
interaction, the laptop will post IK> its system software
version, and the response provided by the IK> anti-theft service
will optionally contain a relative URL of a more IK> recent OS
image.
IK>
IK> If such a pointer has been received and the laptop is behind
a known IK> school server, it will probe the school server via
rsync at the provided IK> relative URL to determine whether the
server has cached the update IK> locally. If the update is not
available locally, the laptop will wait up IK> to 24 hours,
checking approximately hourly whether the school server has IK>
obtained the update. If at the end of this wait period the
school server IK> still does not have a local copy of the
update, it is assumed to be IK> malfunctioning, and the laptop
will contact an upstream master server IK> directly by using the
URL provided originally by the anti-theft service. IK>
IK> In any of these three cases (school server has update
immediately, IK> school server has update after delay, upstream
master has update), we IK> say the laptop has 'found an update
source'.
IK>
IK> Once an update source has been found, the laptop will invoke
the IK> standard rsync tool over a plaintext (unsecured)
connection via the IK> rsync protocol -- not piped through a
shell of any kind -- to bring IK> its own files up to date with
the more recent version of the IK> system. rsync uses a
network-efficient binary diff algorithm which IK> satisfies goal
3.
IK>
IK>
IK>
IK> 1.3. Design note: peer-to-peer updates
IK> --------------------------------------
IK>
IK> It is desirable to provide "viral update" functionality at a
later date, IK> such that two laptops with different software
versions (and without any IK> notion of trust) can engage in an
update to bring the laptop with the IK> older software fully up
to date.
IK>
IK> However, determining how to provide this functionality
securely, IK> efficiently and elegantly is not feasible on the
Gen1 FRS IK> timeline. Therefore, laptop-to-laptop updates will
NOT be a part of the IK> updater that ships with the FRS image,
and are a candidate for release IK> 2-3 months after FRS.
IK>
IK>
IK>
IK> 1.4. Design note: rsync scalability
IK> -----------------------------------
IK>
IK> rsync is a known CPU hog on the server side. It would be
absolutely IK> infeasible to support a very large number of
users from a single rsync IK> server. This is far less of a
problem in our scenario for three reasons: IK>
IK> * High branching factor
IK> In all normal circumstances, the vast majority of the
rsync IK> traffic to our upstream servers will come from
school servers, IK> not
IK> individual laptops. If school servers are unavailable
of IK> malfunctioning, it is not the case that there will
be a flood of IK> requests from individual laptops,
because it's likely that the IK> school servers are those
laptops' only gateway to the Internet. IK>
IK> * Element of randomness in anti-theft requests
IK> Instead of hitting the update servers every hour on
the hour, IK> the laptops are already including an
element of randomness in IK> choosing
IK> when to contact the anti-theft service. This random
delay IK> propagates to
IK> the rsync requests, as well.
IK>
IK> * In-depth stagger abilities on the server side
IK> Because notification of new updates is performed by
the anti- IK> theft
IK> service which is aware of a laptop's locale, updates
can be IK> staggered over several days by country,
region, or any other IK> metric such as server load.
IK>
IK> Additionally, some optimizations can be added to rsync proper
to aid IK> with our use case, but such engineering will need to
wait until after IK> FRS.
IK>
IK>
IK>
IK> 1.5. Implementation
IK> -------------------
IK>
IK> In order to implement runtime file protection, Bitfrost
relies on the IK> COW functionality of the Linux-VServer
patchset. The functionality IK> imbues immutable hardlinks
within a designated context with special IK> meaning: when
broken by some destructive file operation, VServer will IK>
replace these hardlinks with the content of the file they were
pointing IK> to and apply the desired operation on the resulting
copy. IK>
IK> The XO updater will run in a special context to which the
security IK> service has exposed the entire underlying
filesystem as a COW copy. The IK> updater will update this COW
copy in-place with rsync. This COW IK> mechanism simply ensures
no excess authority lies with the updater; any IK> failures or
vulnerabilities in it do not propagate to the rest of the IK>
system.
IK>
IK> One file contained within each OS image will be its
cryptographically IK> signed manifest; at the end of the rsync
operation, the laptop will have IK> obtained that file. At this
point, the updater will request that the IK> security service
applies the update. Note that due to the nature of IK> rsync, we
can stop and restart the network phase of a single update IK>
several times as connectivity becomes available, and until we've
IK> received the complete update.
IK>
IK> The security service will terminate the updater and then
analyze the IK> manifest and confirm the modified files in the
updater's context exactly IK> match the expected OS image
end-state. If any discrepancy is discovered, IK> the updater
context will be discarded and the update operation aborted. IK>
IK> If the update is verified to be complete and correct, the
security IK> service will mark it as such, and designate the
files within it to be IK> the files exported into all
newly-created containers. System service IK> containers will be
restarted gracefully. If the the image manifest did IK> not
contain a header identifying that image as a high-priority
update, IK> the update process ends here. Restartable services
have been restarted, IK> and the rest of the system will be
initialized from the update on IK> reboot.
IK>
IK> If the update has been marked as high-priority, the user will
be asked IK> to close applications and reboot his machine
immediately. A timer will IK> run that will reboot the machine
in 60 minutes if the user does not do IK> so. The high-priority
timer can be disabled in the security center; its IK> purpose is
merely to provide some extra protection to the youngest users
IK> who cannot necessarily be expected to understand or comply
with the IK> reboot request.
IK>
IK> On boot, the first initialization script to run will perform
a IK> pivot_root operation to the directory that currently holds
the OS image IK> marked bootable by the security service. With
the example above, it IK> would be the directory that belonged
to the updater's context. If a key IK> is depressed during boot,
however, the pivot_root is performed to the IK> _old_ bootable
context, and the user presented a dialog asking whether IK> she
would like to make the rollback permanent.
IK>
IK> The kernel is the only special case to this handling: in the
event that IK> a verified update contains an updated kernel,
that kernel will be placed IK> into a predetermined place in the
underlying filesystem by the security IK> service. OpenFirmware
will preferentially boot this newer kernel unless IK> the
rollback key combination is depressed during boot. IK>
IK> Notice that the update operation has been reduced to a simple
state IK> toggle between (any) two OS images. In so doing, we
have satisfied goals IK> 1 and 2.
IK>
IK>
IK>
IK>
IK> 2. Application updater
IK> ======================
IK>
IK> 2.1. Design
IK> -----------
IK>
IK> The XO eschews traditional dependency-based approaches to
package IK> management, making application upgrades somewhat
difficult. The problem IK> is compounded by the fact that
Bitfrost does not permit applications to IK> update themselves
in-place, which is a common update method on platforms IK> such
as Mac OS X and Windows.
IK>
IK> When it comes to application updates, we wish to stay true to
our goals IK> of security and low-bandwidth updates, but are
willing to settle for IK> less fault tolerance as necessitated
by the fact that most activities IK> won't be OLPC-written or
maintained.
IK>
IK> The design should make it possible to have a single tool that
can IK> ascertain the existence of updated versions of any
currently installed IK> activities, and then fetch and install
those updates. It should do so IK> bandwidth-efficiently, such
that files that are unchanged between IK> activity versions
aren't downloaded as part of the update, and also such IK> that
identical resources files packaged by multiple activities are
never IK> downloaded more than once, or not at all if they
already exist on the IK> system.
IK>
IK>
IK>
IK> 2.2. Implementation
IK> -------------------
IK>
IK> A manifest file is added to the bundle format specification.
The IK> manifest consists of the filename and strong
cryptographic hash of every IK> file in the bundle. Another file
is added, called 'origin', that IK> specifies a URL where
updated activity bundles may be found, and a IK> public key
which will be used to sign such updated bundles. IK>
IK> When a global activity update is initiated, the updater
enumerates the IK> origins for all installed activities, then
probes each one in turn to IK> determine which activities have
available updates. The resulting IK> activity list is the
'available update set'.
IK>
IK> The most up-to-date bundle for each activity in the set is
accessed, and IK> the first several kilobytes downloaded. Since
bundles are simple ZIP IK> files, the downloaded data will
contain the ZIP file index which stores IK> byte offsets for the
constituent compressed files. The updater then IK> locates the
bundle manifest in each index and makes a HTTP request with IK>
the respective byte range to each bundle origin. At the end of
this IK> process, the updater has cheaply obtained a set of
manifests of the IK> files in all available activity updates.
IK>
IK> A local database of manifests of all installed activities is
kept, IK> pruned only to records for files larger than a set
size, e.g. 50 IK> KB. The updater cross-references each manifest
from the available IK> update set with the installed database,
and then with other manifests IK> in the set. Files which exist
locally and are also present in the IK> available update set
aren't downloaded; the updater simply "plants" IK> the files in
the right places. The same happens for identical files IK>
present in multiple bundles in the available update set; they
are only IK> downloaded once.
IK>
IK> After a bundle (minus any redundant files) has been
downloaded, it is IK> unpacked and reassembled (if it needs any
of the files that haven't been IK> downloaded because they
already exist). Cryptographic signature IK> verification is
performed. If remaining disk space is larger than a IK>
particular margin, e.g. 20%, then the context containing the
older IK> version of the activity bundle is kept around, and the
user given the IK> ability to perform rollback on the activity
update. Otherwise, the old IK> version bundle is destroyed.
IK>
IK>
IK>
IK>
IK>
IK> :Author
IK> Ivan Krstić
IK> ivan AT laptop.org
IK> One Laptop per Child
IK> http://laptop.org
IK>
IK> :Metadata
IK> Revision: Draft-14
IK> Timestamp: Tue Jun 26 17:51:45 UTC 2007
IK>
IK>
IK> END
IK>
IK>
IK>
IK> --
IK> Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | GPG:
0x147C722D IK>
IK> _______________________________________________
IK> Devel mailing list
IK> Devel at lists.laptop.org
IK> http://lists.laptop.org/listinfo/devel
IK>
--
XA
=========
Don't Panic! The Answer is 42
More information about the Devel
mailing list