[Wikireader] Welcome new members Tabitha Roder and Håkon Lie; Wiki-to-html conversion and compression

Chris Ball cjb at laptop.org
Mon Feb 9 17:23:36 EST 2009


Hi,

   > 7z and bz compression were quite similar (whereas if we were
   > storing many revisions of each article it would be a huge
   > difference).  I believe we're using a variant on standard bzip that
   > allows the per-article decompression we need; Chris can explain in
   > more detail.

The file is standard bzip2, but the tools aren't.  In short, we create
an index from article titles into individual bzip2 blocks, and then
have a tool (based on a modified bzip2recover program) that is able
to decompress arbitrary individual blocks of bzip2 archives.  So, we
ask for an article title which is translated into a block <n>, and
only that block is decompressed and returned.

If we were to move to a different compression scheme, we would need
a similar tool for that scheme.

Thanks,

- Chris.
-- 
Chris Ball   <cjb at laptop.org>


More information about the Wikireader mailing list