XO offline content management strategy / tools

Tue Jul 24 06:36:02 EDT 2007

Hi all,

I am interning at Google over the summer in the offline content team
(Google Book Search).  I have been in discussions with SJ Klein about
ways that offline content can be more easily downloaded, managed and
shared on the XO, especially public domain books in PDF format or
similar.  I am seeking collaborators and feedback on the best ways to
accomplish this in the limited time before the upcoming code freeze.
Apologies in advance for the long email.

I should state up front that although I have been following the OLPC
project from a distance for some time, I know there is a lot going on
that I don't know about that would be relevant to this discussion.
Please enlighten me :-)

Goals:
-- Make it easy for kids to find, download, and share public domain
   content and content bundles.
-- Allow school servers to provide a syllabus or reading list of
   cached offline content that the students can subscribe to, in the
   form of a locally-discoverable feed.  (This can probably use the
   same system for distribution as the students will use to share
   bundles with each other.)
-- Allow kids to manage what content is actually "swapped into" the
   XO, in order to work with the XO's limited storage resources.
-- [Later] Work with the distributed caching layer to pull resources
   from nearby machines, even if there is no Internet connection.
   (I believe someone is working on this.)

Ultimately I want to create a simple content manager application for
downloading and sharing offline content -- basically along the lines
of iTunes for content bundles, with the addition of a sharing
function.  A good starting point for this might be something like
Syncropated ( http://syncropated.garage.maemo.org/ ), combined with
an Atom/RSS parser and Avahi glue to discover and publish feeds.
This sort of approach would allow kids to choose which books or
content bundles they have installed at any given time, and to swap
bundles in and out as needed to stay within the constraints of
available storage.  How does this proposal fit in with the plans
that are already in place to enable kids to share content bundles
between XOs?

I think the offline content problem also needs to be attacked from
several additonal directions at the same time as the above:

-- Google Gears needs to be ported to the Sugar browser to enable
offline storage and caching of web app resources.  (The browser does
not support extensions, so Gears would have to be ported to run
natively inside the browser.)  This would enable web technologies to
be used to create arbitrary offline XO applications, as well as online
applications that work with cached copies of online data when the
Internet connection is spotty.  Unless someone here wants to give this
a shot, I can see if I can find someone on the Gears team to do this
as a 20% time project :-)

-- It could be important to start finding volunteers to create Google
Gears applications that will provide an enhanced experience when
accessing cached content offline.  For example, a Gears app for
Wikipedia would allow the user to browse cross-linked pages that are
stored in the Gears cache, but if a link to an uncached page were
clicked, it would be queued for downloading the next time the user is
online.  A list of newly-available pages would be shown to the user
when they become available.  Edits made to documents would be synced
with the server when online.  It is likely that apps like this will
need serverside support, so this type of app will need to be developed
in collaboration with the content providers.  In some senses, these
offline apps are a stop-gap, because we want better offline behavior
implemented in the browser itself, but in reality offline apps can go
far beyond simple caching by allowing content creation while the user
is offline.

-- We need a better way in general of caching entire web pages with
linked and embedded resources.  Ideally a page or set of pages should
be able to be pulled offline and wrapped up into a standalone content
bundle with a minimum of fuss, so that things like groups of WIkipedia
pages can be made into offline resources for access where there is no
Internet connection.  "Save page as HTML" doesn't really cut it right
now.  Being able to do a good job at this is tightly coupled with the
ability to do distributed caching of content, where XOs will request
pages from their neighbors if they can't get a connection to the Net.
Offline caching of online content is hard to get right but this is
important based on the network availability constraints that will be
faced in many deployments.

-- We need to start encouraging digital libraries to provide
easy-to-parse feeds to their public domain resources, and simple APIs
for searching those repositories.  I will see what I can do about that
at my current employer, but we need the Internet Archive and the other
book scanning efforts to follow suit in making their public resources
programatically accessible.

Please reply with comments/criticisms/feedback!  I am interested to
hear from anyone that is interested in working on (or is already
working on) one of these areas.

Thanks,
Luke Hutchison

PS anyone know why xbook doesn't support djvu-libre, when evince does?
 A lot of public domain scanned books are available in deja vu format.

--

Disclaimer: these opinions are mine and not those of Google; none of
the above should be construed as a product announcement by Google; the
proposed work is my own and is not currently sponsored by Google, etc.
etc. etc.