[Wikireader] Wikireaders Update from Wikimania

Tommi Mäkitalo tommi at tntnet.org
Mon Jul 21 17:00:43 EDT 2008


>
> There was a good deal of interest from attendees in unifying the goals and
> roadmaps for the different reader projects.  I think we can separate out
> questions of
>  * html vs. xml
>  * zeno file vs. zip archive
>  * all images vs. some images vs. no images
>  * full articles vs. headers of articles
>
> and make these options in the selection of a full toolchain.
>
I would like to comment the archive format.

Have you ever tried to zip 100000 files into one archive? And what about 
seeking one specific file? As far as I know the directory of a zip archive is 
not indexed nor sorted, so finding one file in 100000 entries will take some 
time.

Also the compression is not the best, since every file is compressed with zlib 
for its own. This is also done with the current zeno format, but as Emmanuel 
mentioned we work on compressing multiple files together and also using an 
other compression algorithm. We get significantly better compression. A 
example file with 2220 articles is with zip 13MB and with my currently best 
(bzip2 compressed with 512k clusters) only 5MB.

Tommi


More information about the Wikireader mailing list