System update spec proposal

Xavier Alvarez xavi.alvarez at gmail.com
Tue Jun 26 16:23:45 EDT 2007


The wiki-version of this document can be found in 
http://wiki.laptop.org/go/XO_updater

enjoy
/X

On Tuesday 26 June 2007 14:55, Ivan Krstić wrote:
IK> Software updates on the One Laptop per Child's XO laptop
IK> ========================================================
IK>
IK>
IK>
IK>
IK> 0. Problem statement and scope
IK> ==============================
IK>
IK> This document aims to specify the mechanism for updating
 software on the IK> XO-1 laptop. When we talk about updating
 software, we are referring both IK> to system software such as
 the OS and the core services controlled by IK> OLPC that are
 required for the laptop's basic operation, and about any IK>
 installed user-facing applications ("activities"), both those
 provided IK> by OLPC and those provided by third parties.
IK>
IK>
IK>
IK>
IK> 1. System updater
IK> =================
IK>
IK> 1.1. Core goals
IK> ---------------
IK>
IK> The three core goals of a software update tool (hereafter
 "updater") IK> for the
IK> XO are as follows:
IK>
IK>      * Security
IK>      Given the initial age group of our users, it is the only
 reasonable IK>      solution to default to automatic detection
 and installation of IK>      updates, both to be able to apply
 security patches in a timely IK>      fashion, and to enable
 users to benefit from rapid development and IK>     
 improvements in the software they're using. Automatic updates,
 IK>      however, are a security issue unto themselves:
 compromising the IK>      update system in any way can provide
 an attacker with the IK> ability to
IK>      wreak havoc across entire installed bases of laptops
 while IK> bypassing
IK>      -- by design -- all the security measures on the
 machine. IK> Therefore,
IK>      the security of the updater is paramount and must be its
 first IK>      design goal.
IK>
IK>      * Uncompromising emphasis on fault-tolerance
IK>      Given the scale of our deployment, the relatively high
IK> complexity of
IK>      our network stack when compared to currently-common
 deployments, IK> the
IK>      unreliability of Internet connectivity even when
 available, and IK>      perhaps most importantly our desire for
 participating countries to IK>      soon begin customizing the
 official OLPC OS images to best suit IK>      them, it is clear
 that our updater must be fault-tolerant. This is IK>      both
 in the simple sense -- cryptographic checksums need to be used
 IK>      to ensure updates were received correctly -- and in the
 more IK> complex
IK>      sense that the likelihood of a human error with regard
 to update IK>      preparation goes up proportionally to the
 number of different base IK>      OS images at play. A
 fault-tolerant updater will therefore allow IK>     
 _unconditional_ rollback of the most recently applied IK>     
 update. "Unconditional" here means that, barring the failure of
 IK>      other parts of the system which are dependencies of the
 updater IK>      (e.g. the filesystem), the updater must always
 know how to IK> correctly
IK>      unapply an applied update, even if the update was
 malformed. IK>
IK>      * Low bandwidth
IK>      For much the same reasons (project scale, Internet
 access scarcity IK>      and unreliability) that require
 fault-tolerance from the updater, IK>      the tool must take
 maximum care to minimize data transfer IK>      requirements.
 This means, concretely, that a delta-based approach IK>     
 must be utilized by the updater, with a "keyframe" or "heavy"
 IK> update
IK>      being strictly a fallback in the unlikely case an update
 path IK> cannot
IK>      be constructed from the available or reachable delta
 sets. IK>
IK>
IK>
IK> 1.2. Design
IK> -----------
IK>
IK> It is given, due to requirements imposed by the Bitfrost
 security IK> platform, that a laptop will attempt to make daily
 contact with the IK> OLPC anti-theft servers. During that
 interaction, the laptop will post IK> its system software
 version, and the response provided by the IK> anti-theft service
 will optionally contain a relative URL of a more IK> recent OS
 image.
IK>
IK> If such a pointer has been received and the laptop is behind
 a known IK> school server, it will probe the school server via
 rsync at the provided IK> relative URL to determine whether the
 server has cached the update IK> locally. If the update is not
 available locally, the laptop will wait up IK> to 24 hours,
 checking approximately hourly whether the school server has IK>
 obtained the update. If at the end of this wait period the
 school server IK> still does not have a local copy of the
 update, it is assumed to be IK> malfunctioning, and the laptop
 will contact an upstream master server IK> directly by using the
 URL provided originally by the anti-theft service. IK>
IK> In any of these three cases (school server has update
 immediately, IK> school server has update after delay, upstream
 master has update), we IK> say the laptop has 'found an update
 source'.
IK>
IK> Once an update source has been found, the laptop will invoke
 the IK> standard rsync tool over a plaintext (unsecured)
 connection via the IK> rsync protocol -- not piped through a
 shell of any kind -- to bring IK> its own files up to date with
 the more recent version of the IK> system. rsync uses a
 network-efficient binary diff algorithm which IK> satisfies goal
 3.
IK>
IK>
IK>
IK> 1.3. Design note: peer-to-peer updates
IK> --------------------------------------
IK>
IK> It is desirable to provide "viral update" functionality at a
 later date, IK> such that two laptops with different software
 versions (and without any IK> notion of trust) can engage in an
 update to bring the laptop with the IK> older software fully up
 to date.
IK>
IK> However, determining how to provide this functionality
 securely, IK> efficiently and elegantly is not feasible on the
 Gen1 FRS IK> timeline. Therefore, laptop-to-laptop updates will
 NOT be a part of the IK> updater that ships with the FRS image,
 and are a candidate for release IK> 2-3 months after FRS.
IK>
IK>
IK>
IK> 1.4. Design note: rsync scalability
IK> -----------------------------------
IK>
IK> rsync is a known CPU hog on the server side. It would be
 absolutely IK> infeasible to support a very large number of
 users from a single rsync IK> server. This is far less of a
 problem in our scenario for three reasons: IK>
IK>      * High branching factor
IK>        In all normal circumstances, the vast majority of the
 rsync IK>        traffic to our upstream servers will come from
 school servers, IK> not
IK>        individual laptops.  If school servers are unavailable
 of IK>        malfunctioning, it is not the case that there will
 be a flood of IK>        requests from individual laptops,
 because it's likely that the IK>        school servers are those
 laptops' only gateway to the Internet. IK>
IK>      * Element of randomness in anti-theft requests
IK>        Instead of hitting the update servers every hour on
 the hour, IK>        the laptops are already including an
 element of randomness in IK> choosing
IK>        when to contact the anti-theft service. This random
 delay IK> propagates to
IK>        the rsync requests, as well.
IK>
IK>      * In-depth stagger abilities on the server side
IK>        Because notification of new updates is performed by
 the anti- IK> theft
IK>        service which is aware of a laptop's locale, updates
 can be IK>        staggered over several days by country,
 region, or any other IK>        metric such as server load.
IK>
IK> Additionally, some optimizations can be added to rsync proper
 to aid IK> with our use case, but such engineering will need to
 wait until after IK> FRS.
IK>
IK>
IK>
IK> 1.5. Implementation
IK> -------------------
IK>
IK> In order to implement runtime file protection, Bitfrost
 relies on the IK> COW functionality of the Linux-VServer
 patchset. The functionality IK> imbues immutable hardlinks
 within a designated context with special IK> meaning: when
 broken by some destructive file operation, VServer will IK>
 replace these hardlinks with the content of the file they were
 pointing IK> to and apply the desired operation on the resulting
 copy. IK>
IK> The XO updater will run in a special context to which the
 security IK> service has exposed the entire underlying
 filesystem as a COW copy. The IK> updater will update this COW
 copy in-place with rsync. This COW IK> mechanism simply ensures
 no excess authority lies with the updater; any IK> failures or
 vulnerabilities in it do not propagate to the rest of the IK>
 system.
IK>
IK> One file contained within each OS image will be its
 cryptographically IK> signed manifest; at the end of the rsync
 operation, the laptop will have IK> obtained that file. At this
 point, the updater will request that the IK> security service
 applies the update. Note that due to the nature of IK> rsync, we
 can stop and restart the network phase of a single update IK>
 several times as connectivity becomes available, and until we've
 IK> received the complete update.
IK>
IK> The security service will terminate the updater and then
 analyze the IK> manifest and confirm the modified files in the
 updater's context exactly IK> match the expected OS image
 end-state. If any discrepancy is discovered, IK> the updater
 context will be discarded and the update operation aborted. IK>
IK> If the update is verified to be complete and correct, the
 security IK> service will mark it as such, and designate the
 files within it to be IK> the files exported into all
 newly-created containers. System service IK> containers will be
 restarted gracefully.  If the the image manifest did IK> not
 contain a header identifying that image as a high-priority
 update, IK> the update process ends here. Restartable services
 have been restarted, IK> and the rest of the system will be
 initialized from the update on IK> reboot.
IK>
IK> If the update has been marked as high-priority, the user will
 be asked IK> to close applications and reboot his machine
 immediately. A timer will IK> run that will reboot the machine
 in 60 minutes if the user does not do IK> so. The high-priority
 timer can be disabled in the security center; its IK> purpose is
 merely to provide some extra protection to the youngest users
 IK> who cannot necessarily be expected to understand or comply
 with the IK> reboot request.
IK>
IK> On boot, the first initialization script to run will perform
 a IK> pivot_root operation to the directory that currently holds
 the OS image IK> marked bootable by the security service. With
 the example above, it IK> would be the directory that belonged
 to the updater's context. If a key IK> is depressed during boot,
 however, the pivot_root is performed to the IK> _old_ bootable
 context, and the user presented a dialog asking whether IK> she
 would like to make the rollback permanent.
IK>
IK> The kernel is the only special case to this handling: in the
 event that IK> a verified update contains an updated kernel,
 that kernel will be placed IK> into a predetermined place in the
 underlying filesystem by the security IK> service.  OpenFirmware
 will preferentially boot this newer kernel unless IK> the
 rollback key combination is depressed during boot. IK>
IK> Notice that the update operation has been reduced to a simple
 state IK> toggle between (any) two OS images. In so doing, we
 have satisfied goals IK> 1 and 2.
IK>
IK>
IK>
IK>
IK> 2. Application updater
IK> ======================
IK>
IK> 2.1. Design
IK> -----------
IK>
IK> The XO eschews traditional dependency-based approaches to
 package IK> management, making application upgrades somewhat
 difficult. The problem IK> is compounded by the fact that
 Bitfrost does not permit applications to IK> update themselves
 in-place, which is a common update method on platforms IK> such
 as Mac OS X and Windows.
IK>
IK> When it comes to application updates, we wish to stay true to
 our goals IK> of security and low-bandwidth updates, but are
 willing to settle for IK> less fault tolerance as necessitated
 by the fact that most activities IK> won't be OLPC-written or
 maintained.
IK>
IK> The design should make it possible to have a single tool that
 can IK> ascertain the existence of updated versions of any
 currently installed IK> activities, and then fetch and install
 those updates. It should do so IK> bandwidth-efficiently, such
 that files that are unchanged between IK> activity versions
 aren't downloaded as part of the update, and also such IK> that
 identical resources files packaged by multiple activities are
 never IK> downloaded more than once, or not at all if they
 already exist on the IK> system.
IK>
IK>
IK>
IK> 2.2. Implementation
IK> -------------------
IK>
IK> A manifest file is added to the bundle format specification.
 The IK> manifest consists of the filename and strong
 cryptographic hash of every IK> file in the bundle. Another file
 is added, called 'origin', that IK> specifies a URL where
 updated activity bundles may be found, and a IK> public key
 which will be used to sign such updated bundles. IK>
IK> When a global activity update is initiated, the updater
 enumerates the IK> origins for all installed activities, then
 probes each one in turn to IK> determine which activities have
 available updates. The resulting IK> activity list is the
 'available update set'.
IK>
IK> The most up-to-date bundle for each activity in the set is
 accessed, and IK> the first several kilobytes downloaded. Since
 bundles are simple ZIP IK> files, the downloaded data will
 contain the ZIP file index which stores IK> byte offsets for the
 constituent compressed files. The updater then IK> locates the
 bundle manifest in each index and makes a HTTP request with IK>
 the respective byte range to each bundle origin. At the end of
 this IK> process, the updater has cheaply obtained a set of
 manifests of the IK> files in all available activity updates.
IK>
IK> A local database of manifests of all installed activities is
 kept, IK> pruned only to records for files larger than a set
 size, e.g. 50 IK> KB. The updater cross-references each manifest
 from the available IK> update set with the installed database,
 and then with other manifests IK> in the set. Files which exist
 locally and are also present in the IK> available update set
 aren't downloaded; the updater simply "plants" IK> the files in
 the right places. The same happens for identical files IK>
 present in multiple bundles in the available update set; they
 are only IK> downloaded once.
IK>
IK> After a bundle (minus any redundant files) has been
 downloaded, it is IK> unpacked and reassembled (if it needs any
 of the files that haven't been IK> downloaded because they
 already exist). Cryptographic signature IK> verification is
 performed. If remaining disk space is larger than a IK>
 particular margin, e.g. 20%, then the context containing the
 older IK> version of the activity bundle is kept around, and the
 user given the IK> ability to perform rollback on the activity
 update. Otherwise, the old IK> version bundle is destroyed.
IK>
IK>
IK>
IK>
IK>
IK> :Author
IK>      Ivan Krstić
IK>      ivan AT laptop.org
IK>      One Laptop per Child
IK>      http://laptop.org
IK>
IK> :Metadata
IK>      Revision: Draft-14
IK>      Timestamp: Tue Jun  26 17:51:45 UTC 2007
IK>
IK>
IK> END
IK>
IK>
IK>
IK> --
IK> Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | GPG:
 0x147C722D IK>
IK> _______________________________________________
IK> Devel mailing list
IK> Devel at lists.laptop.org
IK> http://lists.laptop.org/listinfo/devel
IK>

-- 
XA
=========
Don't Panic!  The Answer is 42



More information about the Devel mailing list