Cutting a slice of wikipedia - CDPedia

Martin Langhoff martin.langhoff at
Wed Apr 9 17:52:24 EDT 2008

On Wed, Apr 9, 2008 at 2:53 PM, Samuel Klein < at> wrote:
> It's nice to see a python toolchain for this (though I don't see any code at
> that url?)  They exist in other languages as well.  We've been working with
> Linterweb's Kiwix ( and the Schools-Wikipedia, which use their own
> toolchains.

[fixed up cc list]


I suspected that there would be something out there - Alecu's
implementation has some interesting smarts in that it does an
auto-selection of the pages to include. I'll let him explan that. The
wikislice page talks about the user providing the list of urls, which
means you need to auto-generate that somehow.

Maybe we can integrate CDPedia's scoring scheme?

[I did an svn checkout of kiwix, this thing has an embedded gecko.]

> ps - I don't see code at the google-code url... and "cdpedia" is a name used
> by a few existing projects, some commercial; you might want to choose
> another name.

Go to the code page, and click on the svn browse thingy...

> pps - Martin: simple: is nice, but not of uniform quality

Good to know! --  I wasn't ewxpecting too much uniform-ness out of
wikipedia anyway ;-)


 martin.langhoff at
 martin at -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first

More information about the Devel mailing list