e-Book reader

Thu Aug 2 20:02:40 EDT 2007

Yoric wrote:
> Okay, as my message seems to have bounced from devel, I'll try again.
> 
> I believe that hijacking an existing, standard, protocol (here http) is
> a bad idea. Firstly, because turning standard protocols into hack
> versions that sometimes work as per the standard and sometimes don't is
> usually opening a can of worms. Who knows what that is going to break ?
> Who can tell exactly what depends on the correct behaviour of http ? Not
> to mention that there's a whole bunch of things you need to reimplement
> by yourselves if you want to use http. In practice, from my experience
> of Mozilla's source code, you'll have to reimplement caching,
> authentification, redirections, etc. That's quite the opposite of the
> expected benefits of hijacking.

Really we're just talking about increasing the caching and providing the 
user more access to control that cache.  So it's not that substantial of 
a change.

HTTP doesn't make any firm guarantees about how content is delivered. 
Proxying is very common, and really we're talking about a proxy (though 
one implemented directly in the browser... which I guess isn't a "proxy" 
but an actual client... but the same ideas apply).  We could also 
implement it as a formal proxy hosted directly on the laptop.  Just in 
terms of implementation I think it would be better to do directly in the 
browser since the process isn't completely transparent, maybe with a 
DBus API if we want to make the same caching system available to other 
applications.

> Secondly, hacking http means that you rely only on http. That's good if
> you only want to download books from http servers. But what about the
> other important protocols such as file: ? Are you also going to hack
> file: ? What about ftp: ?

file: doesn't have to be modified, since it's locally available and 
doesn't need to be cached.  ftp and other protocols don't really matter, 
IMHO.

> Thirdly, using a hacked http: (or file: or ftp:) means the subtle yet
> annoying problem of referencing resources (say particular pages or
> images) inside a book. In 
>   http://mydomain.org/a/b/c/d
> what part of a/b/c/is the directory containing the book ? What part is
> the identifier of the book ? What part is the name of the resource ?
> What if resource names involve directories ? Etc. Sure, you can solve
> the problem by using smart conventions on URLs or by toying with
> exclamation marks, interrogation marks and sharp signs but I suspect
> that you'll quickly end up with having to hack the very notion of URL
> away from what's used in http: . And it gets worse if your books may be
> generated or delivered dynamically -- hence involving interrogation
> marks for queries -- or if some resources inside the book may be
> generated or delivered dynamically or, even worse, if books may contain
> books.

I don't really understand the problem here at all.  Why is it hard to 
know where the "book" is?  I put in scare quotes, because a book is not 
a distinct thing online -- a book can easily be split or combined into 
volumes online, so its distinct identity is unclear.

Anyway, a simple way to do referencing is relative links.  The book 
pages need to understand the internal structure of how they were 
generated -- e.g., if the book is a flat set of pages, just linking to 
"./" will get to the "root" of the book.

If you need to tell clients the structure of the book, HTML already has 
standards for this, like <link rel="index" href="...">.  Well 
constructed books can use those existing conventions.

I guess why I'm confused by this is that all the problems you mention 
are the problems we are all very familiar with in creating content on 
the web.  And there's lots of conventions and solutions and tools to 
address these problems.

> By opposition, the library: protocol
> * doesn't break anything
> * can work together with any delivery protocol (we're using mostly http:
> and file:, but also jar: for decompression and we hope we'll be able to
> use some peer-to-peer protocol in the future for distributed libraries)
> * already takes advantage of Mozilla's caching
> * resolves ambiguities between book identifier / resources inside the
> book / book inside book / etc.
> 
> 
> The one downside I see about library: is the possibility of having two
> different books with the same "unique" identifier. And I'm confident
> there's a way to find workarounds. Perhaps by making the *identifier* --
> and only the identifier -- a nicely encoded URL.
> 
> Say, something like
>   library:mydomain.org!a!b/c/d
> being automatically turned into
>   http://mydomain.org/a/b
> for downloading/authentification purposes. Here, I assume that http is
> the default downloading mechanism. Other non-default protocols may be
> specified. Note that I'm avoiding %-based encodings only for readability
> purposes. If readability is not a problem, we can use directly that
> standard encoding.

This is where I have to start thinking -- if we're encoding an HTTP URI 
in the library URI, why not just stick with HTTP from the beginning?

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
             : Write code, do good : http://topp.openplans.org/careers