[Bookreader] gnubook/Read-activity update

Samuel Klein sj at laptop.org
Sat Feb 14 02:16:20 EST 2009

Here are some quick notes from a discussion this afternoon about
producing better downloadable collections, testing new Read features,
and gnubook support.   We'll meet again next week, and every two weeks
after that, next week likely on Friday again (time TBA).

Sayamindu is close to having gnubook support ready to test, and it's
looking great for short books with full-resolution page images.
(roughly 5-10x the size of the equivalent PDF.  there's more work to
be done here.)   To show off and compare gnubook/pdf/html support, Raj
and I discussed a few next steps:

 1) start with a list of 5000 titles : draw from the Archive,
Gutenberg, children's collection, CK-12 (15 textbooks), & a mobile
classics list.  Make sure all are available as html, pdf, and flipbook
[there's some instance-to-work association needed here, and
potentially pdf-to-html conversion with image placement].

 2) define an AJAX html- and txt-reader (based on something very
simple) that provides a simple stylesheet with omnipresent links back
to metadata / to an online high-quality version [a flipbook] where
available, and styles text well for reading : margins & good font size
for longer reading spells in wide columns.

 3) write a bundling script that understands the Open Library (or
other) metadata api and can generate an .xol collection index for a
list of books. A single work's metadata should include a link to the
original, and a link to each of its formats online.

 4) related projects :
  a. define a file extension? for flipbook books, since they take
their own reader / find another way to auto launch the reader from
clicking on a link to a book file
  b. make each of the formats smaller -- shrink Flipbook images, use
text-only pdf's, compress a shelf of html books and only unpack the
one being read (all in the reader)
  c. publish a toolchain for converting from each format to the others
  d. use wikisource to publish and correct OCR-html from pristine PDFs
for a wikibook version of each... rate limited and on demand, so as
not to flood ws.
  e. extend Read testing to epub and djvu formats
  f. figure out a flexible way to let Read as well as other tools read
multiple formats : txt, html, zip. [is it necessary to load all of
Browse to read a simple  html page?]
  g. Make 5-10 large bundles of HTML only; with links to the flipbooks
for each.  Provide a single-book-bundling link next to each
flipbook... or simply a download link in a unique format.


Now that people will have a choice of formats to use, let them give
real feedback.  Define a 30-min test suite for picking a collection
and a book in it and testing various reader options.  Find heavy
reader already addicted to reading longer works on their laptops or
mobiles, get input.

In terms of finding a better html viewer, some thoughts tossed around
about design and layout options:
--> use the flipbook frame/design?  html is already chunked into pages...
--> which books have pagebreaks in their html?  add to metadata somehow.
--> correlate the OCR, which has pagebreaks, and insert based on statistics

On the Read side, new feature requests are welcome.  On the Open
Library side, edward in London is working on some harder library
metadata problems, and we should get him on this list.


