C. Scott Ananian
cscott at cscott.net
Mon Aug 6 11:14:36 EDT 2007
On 8/6/07, Yoric <Yoric at users.sf.net> wrote:
> On Mon, 2007-08-06 at 03:12 -0400, C. Scott Ananian wrote:
> > On 8/3/07, Yoric <Yoric at users.sf.net> wrote:
> > > With this in mind, I intend to be able to reference
> > > * the package itself (to be able to download it, from Firefox or from
> > > anywhere else)
> > http://canonical.source/alice.in.wonderland.zip
> In what you write, is "canonical.source" a string constant that should
> be interpreted by the proxy (say a variant on "localhost") or are there
> a variety of different canonical sources ?
No, it is exactly what the URL says: the canonical source of the book.
The "publisher". You should be able to use standard HTTP on that URL
and get the contents of the book.
> Once the book has been downloaded locally or, say, added to a
> hypothetical peer-to-peer library, do you refer to it with the same http
> URL or with a file URL (respectively a peer-2-peer protocol URL) ?
The same URL. That's the whole point of URLs! The hypothetical
peer-to-peer library is just a fancy type of web cache: resources
which live canonically at (say) http://cscott.net can actually be
obtained from my neighbor (say) or the schoolserver. But this is done
via the standard http proxying and caching mechanisms. We *could*
return (say) a special http header which indicates that this resource
is peer-to-peer cachable, but I'd prefer not: I still don't see how
"books" are fundamentally different from other web content. It seems
more likely that (for example) a schoolserver would be configured
(server-side) to preferentially cache some content which it knows to
be the textbooks for the class.
> > Obtaining this URL from the URL of an individual "page" of the book is
> > done with the standard <link> tags.
> For this, you need a starting page. Do I take it that
> http://canonical.source/alice.in.wonderland.zip/index.html must be the
> "first" page of the book ?
Why not http://canonical.source/alice.in.wonderland.zip/ ?
In any case, there are standard <link> tags to take you to the "first
page" or to the "table of contents".
If your .zip files are some super-special book-archive format, then
they could contain a manifest which tells what the first page is.
But this point is moot. How do you find out about a book? Someone
gives you a URL, either via a link or some other means. That URL
points to a page in the book. That's how you find the "first page".
This is exactly how it works on the web today, and we don't need to
reinvent any of it.
> > > * resources internal to the package (to be able to display them, to
> > > bookmarks them, to link to them)
> > http://canonical.source/alice.in.wonderland.zip/chapter1
> > Serving this URL is the job of the web server, which is allowed by the
> > http specification to parse the non-host part of the URL any way it
> > wants.
> Does this mean that in addition to hacking the proxy, you also need to
> hack the web server ?
I could unpack the files and it will work just fine: you just won't be
able to download the entire book at one go. That's a perfectly
reasonable fall-back position, and should be completely transparent to
the user (except that pages might take a bit longer to load, since
they're being fetched on demand). Or you can write a simple cgi
script to serve both the .zip and the individual pages. That's not
"hacking the web server", any more than installing mediawiki is.
Really, now. Did we 'hack the web server' to make
The beauty of using standard HTTP is that the pieces work
incrementally. URLs work as they are without any fancy proxying or
serving. You can gradually make the individual pieces smarter to do
fancy caching or bulk downloading or whatnot, without having to build
the whole edifice at once. Further, the content generated is still
accessible to users *without* any of your fancy tools, avoiding the
creation of an OLPC content ghetto.
( http://cscott.net/ )
More information about the Devel