e-Book reader

Tue Jul 31 07:15:19 EDT 2007

Okay, as my message seems to have bounced from devel, I'll try again.

I believe that hijacking an existing, standard, protocol (here http) is
a bad idea. Firstly, because turning standard protocols into hack
versions that sometimes work as per the standard and sometimes don't is
usually opening a can of worms. Who knows what that is going to break ?
Who can tell exactly what depends on the correct behaviour of http ? Not
to mention that there's a whole bunch of things you need to reimplement
by yourselves if you want to use http. In practice, from my experience
of Mozilla's source code, you'll have to reimplement caching,
authentification, redirections, etc. That's quite the opposite of the
expected benefits of hijacking.

Secondly, hacking http means that you rely only on http. That's good if
you only want to download books from http servers. But what about the
other important protocols such as file: ? Are you also going to hack
file: ? What about ftp: ?

Thirdly, using a hacked http: (or file: or ftp:) means the subtle yet
annoying problem of referencing resources (say particular pages or
images) inside a book. In 
  http://mydomain.org/a/b/c/d
what part of a/b/c/is the directory containing the book ? What part is
the identifier of the book ? What part is the name of the resource ?
What if resource names involve directories ? Etc. Sure, you can solve
the problem by using smart conventions on URLs or by toying with
exclamation marks, interrogation marks and sharp signs but I suspect
that you'll quickly end up with having to hack the very notion of URL
away from what's used in http: . And it gets worse if your books may be
generated or delivered dynamically -- hence involving interrogation
marks for queries -- or if some resources inside the book may be
generated or delivered dynamically or, even worse, if books may contain
books.

By opposition, the library: protocol
* doesn't break anything
* can work together with any delivery protocol (we're using mostly http:
and file:, but also jar: for decompression and we hope we'll be able to
use some peer-to-peer protocol in the future for distributed libraries)
* already takes advantage of Mozilla's caching
* resolves ambiguities between book identifier / resources inside the
book / book inside book / etc.

The one downside I see about library: is the possibility of having two
different books with the same "unique" identifier. And I'm confident
there's a way to find workarounds. Perhaps by making the *identifier* --
and only the identifier -- a nicely encoded URL.

Say, something like
  library:mydomain.org!a!b/c/d
being automatically turned into
  http://mydomain.org/a/b
for downloading/authentification purposes. Here, I assume that http is
the default downloading mechanism. Other non-default protocols may be
specified. Note that I'm avoiding %-based encodings only for readability
purposes. If readability is not a problem, we can use directly that
standard encoding.

Cheers,
 David

On Mon, 2007-07-09 at 23:42 -0400, Samuel Klein wrote: 
> > > >>> I'd rather hijack http: for the same thing you are doing, but I get the
> > > >>> impression creating a new protocol is relatively simple in comparison.
> > > > With library: you are keying books off ids.  http: is just keying books off
> > > > the URI, which is a string just like the id is a string.  It's okay to just
> > > > treat it as a string.
> >
> > I would also rather see us use http:// as our protocol scheme.  Http
> > seems to answer three of the above questions:
> >   a) who owns the identifier 'http://cscott.net/ElectronicsTextbook'
> >        the people behind cscott.net, of course.  this prevents id duplication.
> >   b) what happens if I don't actually have the ElectronicsTextbook on my machine
> >       the URI gives you a location where you can download it.
> >       (although we have a lot of flexibility about what content gets
> > served from that URL --
> >        it could just be a redirection or metadata of some kind)
> >   c) how do I tell if my ElectronicTextbook is the "real" ElectronicsTextbook
> >       I can always compare it to the canonical version, using (for
> > example) http etags.
> >
> > I haven't heard the counter-argument for 'library:', so maybe I'm
> > missing some compelling reason to invent our own protocol, but this
> > seems like another case where we should be reusing rather than
> > reinventing.
> >  --scott
> 
> Using http:// does have great advantages.  I'm not sure ambiguity of
> identifiers is one of our top problems... URIs are good at being
> unique, as noted above; and you can never completely solve the problem
> of people cloning materials and making new copies with new names that
> are identical to the old materials.  It is useful to take advantage of
> caching already in place for http:// .  And it is useful to be able to
> view with a reader any accessible material with a URL, without special
> preprocessing or database seeding.
> 
> SJ