[Wikireader] Welcome new members Tabitha Roder and Håkon Lie; Wiki-to-html conversion and compression

Samuel Klein sj at laptop.org
Mon Feb 9 17:13:21 EST 2009


Welcome to our newest list members: Tabitha Roder and Håkon Lie.
Tabitha has spent a goodly amount of time lately with Mel (thanks for
returning her in one piece :), and Håkon is working on making
dead-tree Wikipedia dead sexy and tackling many of the same interface
and selection problems.  I'll let them both say more about their
wikireader interests.


Håkon writes:

< do you have a canonical/representatvie Wikipedia document that
<  you perform your research on. E.g., to measure compression?

Chris Ball used a corpus of 30k spanish articles for tests --
single-article compression would be significantly different.
   http://dev.laptop.org/~cjb/eswiki/es_PE.xml.bz2

7z and bz compression were quite similar (whereas if we were storing
many revisions of each article it would be a huge difference).  I
believe we're using a variant on standard bzip that allows the
per-article decompression we need; Chris can explain in more detail.


> I'd like to do a case study where I convert wiki markup to HTML markup
> in the best way possible. The work is somewhat tedious and I'd like to
> do it on a document other people know.

I see.  Tim Starling may be interested in this -- he's done similar
work when making/optimizing static.wikipedia.org.   I reckon there is
prior art to share.

I've seen the [[Dog]] article in English used for examples, if you
want a single one.

SJ

> Cheers,
>
> -h&kon
>              Håkon Wium Lie                          CTO °þe(R)ª
> howcome at opera.com                  http://people.opera.com/howcome


More information about the Wikireader mailing list