Cutting a slice of wikipedia - CDPedia

Alejandro J. Cura alecu at
Thu Apr 10 09:42:53 EDT 2008

On Tue, Apr 8, 2008 at 9:21 PM, Martin Langhoff
<martin.langhoff at> wrote:
> Yesterday we had a mini-sprint with argentinian pythonistas and we
>  discussed Alecu's CDPedia  which is a Python toolchain that does are
>  good job of cutting a slice of wikipedia and cutting off the least
>  interesting parts to make it fit. His project is here
>  and it would be great if Alecu could explain a bit more what it does
>  -- I am sure I didn't do it any justice above ;-)
>  So - Alecu, meet the list, list, say hi to Alecu ;-)

Hi all,

First of all, let me stress that CDpedia is project of the local
Python Users group, and not just a project of mine ;-)
What we are looking for is one cd with a reasonable subset of the
wikipedia, aimed at schools that already have some  computers but no
internet access, or with dial up modems that are not connected all the

What we have developed is a set of scripts that take a static html
wikipedia dump and thru some slicing and dicing end up producing an
iso image with some bzip2 compressed blocks optimized for reading from
a cd, and a small program that sets up a web server that serves up
articles from this blocks for a local browser. No wikipedia images are
included yet in this process, but we have a few ideas on how to make
them fit.

Right now we have released an alpha 0.1 iso image, that allows you to
burn a cd that will automatically play back on xp, (and manually on
linux distros). We need some more docs on how to make it work, though.


More information about the Devel mailing list