[Wikireader] Welcome new members Tabitha Roder and Håkon Lie; Wiki-to-html conversion and compression
Samuel Klein
sj at laptop.org
Mon Feb 9 17:13:21 EST 2009
Welcome to our newest list members: Tabitha Roder and Håkon Lie.
Tabitha has spent a goodly amount of time lately with Mel (thanks for
returning her in one piece :), and Håkon is working on making
dead-tree Wikipedia dead sexy and tackling many of the same interface
and selection problems. I'll let them both say more about their
wikireader interests.
Håkon writes:
< do you have a canonical/representatvie Wikipedia document that
< you perform your research on. E.g., to measure compression?
Chris Ball used a corpus of 30k spanish articles for tests --
single-article compression would be significantly different.
http://dev.laptop.org/~cjb/eswiki/es_PE.xml.bz2
7z and bz compression were quite similar (whereas if we were storing
many revisions of each article it would be a huge difference). I
believe we're using a variant on standard bzip that allows the
per-article decompression we need; Chris can explain in more detail.
> I'd like to do a case study where I convert wiki markup to HTML markup
> in the best way possible. The work is somewhat tedious and I'd like to
> do it on a document other people know.
I see. Tim Starling may be interested in this -- he's done similar
work when making/optimizing static.wikipedia.org. I reckon there is
prior art to share.
I've seen the [[Dog]] article in English used for examples, if you
want a single one.
SJ
> Cheers,
>
> -h&kon
> Håkon Wium Lie CTO °þe(R)ª
> howcome at opera.com http://people.opera.com/howcome
More information about the Wikireader
mailing list