[Wikireader] Wikireaders Update from Wikimania
tommi at tntnet.org
Mon Jul 21 17:00:43 EDT 2008
> There was a good deal of interest from attendees in unifying the goals and
> roadmaps for the different reader projects. I think we can separate out
> questions of
> * html vs. xml
> * zeno file vs. zip archive
> * all images vs. some images vs. no images
> * full articles vs. headers of articles
> and make these options in the selection of a full toolchain.
I would like to comment the archive format.
Have you ever tried to zip 100000 files into one archive? And what about
seeking one specific file? As far as I know the directory of a zip archive is
not indexed nor sorted, so finding one file in 100000 entries will take some
Also the compression is not the best, since every file is compressed with zlib
for its own. This is also done with the current zeno format, but as Emmanuel
mentioned we work on compressing multiple files together and also using an
other compression algorithm. We get significantly better compression. A
example file with 2220 articles is with zip 13MB and with my currently best
(bzip2 compressed with 512k clusters) only 5MB.
More information about the Wikireader