Cutting a slice of wikipedia - CDPedia

Samuel Klein meta.sj at gmail.com
Wed Apr 9 18:31:29 EDT 2008


I'd like to see the auto-selection code; I don't find it in the trunk atm.
I do see hints of using mwlib, which is good; it is well-maintained.
  http://groups.google.com/group/mwlib

For live slices, using MediaWiki's API rather than a dump, there's mwclient.

http://fisheye.ts.wikimedia.org/browse/bryan/mwclient/trunk/README.txt?r=HEAD

More scoring schemes are welcome.  See also wikiosity's simple
relevance-scoring code, which takes in a few keywords and considers 1st &
2nd-order links.
  http://dev.laptop.org/git?p=projects/wikiosity;a=tree

SJ


On Wed, Apr 9, 2008 at 5:48 PM, Martin Langhoff <martin.langhoff at gmail.com>
wrote:

> On Wed, Apr 9, 2008 at 2:53 PM, Samuel Klein <meta.sj at gmail.com> wrote:
> > It's nice to see a python toolchain for this (though I don't see any
> code at
> > that url?)  They exist in other languages as well.  We've been working
> with
> > Linterweb's Kiwix (kiwix.org) and the Schools-Wikipedia, which use their
> own
> > toolchains.
>
> Hi SJ
>
> I suspected that there would be something out there - Alecu's
> implementation has some interesting smarts in that it does an
> auto-selection of the pages to include. I'll let him explan that. The
> wikislice page talks about the user providing the list of urls, which
> means you need to auto-generate that somehow.
>
> Maybe we can integrate CDPedia's scoring scheme?
>
> [I did an svn checkout of kiwix, this thing has an embedded gecko.]
>
> > ps - I don't see code at the google-code url... and "cdpedia" is a name
> used
> > by a few existing projects, some commercial; you might want to choose
> > another name.
>
> Go to the code page, and click on the svn browse thingy...
>
> > pps - Martin: simple: is nice, but not of uniform quality
>
> Good to know! --  I wasn't ewxpecting too much uniform-ness out of
> wikipedia anyway ;-)


A pity (-:   The Wikipedia 1.0 and schools-wikipedia projects are good at
uniformity, and can use support in new languages.

SJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080409/d7cf337a/attachment.html>


More information about the Devel mailing list