e-Book reader

Ian Bicking ianb at colorstudy.com
Mon Aug 6 12:48:01 EDT 2007


C. Scott Ananian wrote:
> On 8/6/07, Yoric <Yoric at users.sf.net> wrote:
>> On Mon, 2007-08-06 at 03:12 -0400, C. Scott Ananian wrote:
>>> On 8/3/07, Yoric <Yoric at users.sf.net> wrote:
>>>> With this in mind, I intend to be able to reference
>>>> * the package itself (to be able to download it, from Firefox or from
>>>> anywhere else)
>>> http://canonical.source/alice.in.wonderland.zip

As an aside, but also because zip files seem to be complicating some 
other use cases, in my tests zip files aren't substantially smaller than 
JFFS2's normal zlib compression.  I listed some examples here:
   http://lists.laptop.org/pipermail/library/2007-July/000073.html
And a script to test it out for yourself here:
   http://svn.colorstudy.com/home/ianb/olpc/jffs2size.py
The zip files are a bit smaller, probably because it can compress entire 
files while JFFS2 only compresses 4K chunks.  Notably tar.gz files are 
substantially smaller than either, but I don't believe tar files are 
appropriate for storing pages since it's harder to access individual files.

Zip files would be faster to download, but not much smaller if you are 
connecting with a web server and client that know gzip compression 
(there will be some extra overhead from all the headers).  If you have 
Keep-Alive working on both sides too, the added latency won't be too bad 
I'd think.  (Though maybe HTTP pipelining would improve things even 
more?  Not many clients know how to do HTTP pipelining from what I 
understand, but I'm a little fuzzy on the whole thing.)

More of a problem is actually figuring out what all pages you want or 
need.  You can download them incrementally, which is nice (you can view 
the first couple pages, for instance, while the rest is downloading). 
But if you have to follow <link rel="next"> to get all the pages that's 
going to be somewhat annoying to handle.  A complete enumeration would 
be nice to have up-front.  If the server and client could agree on 
passing around a tarball, that would be even nicer, but that's something 
we'd have to rig up on our own.  I can imagine a header:

   X-Tar-Fetcher: http://this.server/tarball.cgi

And, seeing that header and if the client knows it has a bunch of files 
to fetch, it would fetch:

   http://this.server/tarball.cgi?compress=gz&url={url1}&url={url2}&...

And the server would create a tar.gz file from all those urls.  Anyway, 
just one possible solution to the possible problem of getting lots of 
files over HTTP.


>> Once the book has been downloaded locally or, say, added to a
>> hypothetical peer-to-peer library, do you refer to it with the same http
>> URL or with a file URL (respectively a peer-2-peer protocol URL) ?
> 
> The same URL.  That's the whole point of URLs!  The hypothetical
> peer-to-peer library is just a fancy type of web cache: resources
> which live canonically at (say) http://cscott.net can actually be
> obtained from my neighbor (say) or the schoolserver.  But this is done
> via the standard http proxying and caching mechanisms.  We *could*
> return (say) a special http header which indicates that this resource
> is peer-to-peer cachable, but I'd prefer not: I still don't see how
> "books" are fundamentally different from other web content.  It seems
> more likely that (for example) a schoolserver would be configured
> (server-side) to preferentially cache some content which it knows to
> be the textbooks for the class.

I wrote up some thoughts for this here:

   http://wiki.laptop.org/go/Content_repository#Proposed_implementation

When you have poor connectivity, you want the children to be able to get 
a complete/consistent set of resources when they do have connectivity. 
For instance, if they have material they are supposed to study you want 
them to be able to "take it home" so to speak, by preloading their 
laptop with that material.

Definitely fetching it from peers is a possibility.  But that can't work 
predictably; you can never be sure what your peers will have, or if 
they'll be around.  I suspect this will annoy teachers most of all.

The idea I was thinking of is to have some way to enumerate a set of 
resources, and then use that as the basis for pre-fetching.  A book is 
an obvious set of resources (all the pages in the book), but you could 
also do it chapter-by-chapter, include articles or pages or Javascript 
tutorials, or whatever.

I played around with a proxy that looked for this kind of markup, and 
added links to cache and uncache pages based on that.  The proxy itself 
was kind of lame, though.  Anyway, for the curious:

   http://svn.colorstudy.com/home/ianb/OLPCProxy



-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
             : Write code, do good : http://topp.openplans.org/careers



More information about the Devel mailing list