Cutting a slice of wikipedia - CDPedia
Samuel Klein
meta.sj at gmail.com
Wed Apr 9 18:31:29 EDT 2008
I'd like to see the auto-selection code; I don't find it in the trunk atm.
I do see hints of using mwlib, which is good; it is well-maintained.
http://groups.google.com/group/mwlib
For live slices, using MediaWiki's API rather than a dump, there's mwclient.
http://fisheye.ts.wikimedia.org/browse/bryan/mwclient/trunk/README.txt?r=HEAD
More scoring schemes are welcome. See also wikiosity's simple
relevance-scoring code, which takes in a few keywords and considers 1st &
2nd-order links.
http://dev.laptop.org/git?p=projects/wikiosity;a=tree
SJ
On Wed, Apr 9, 2008 at 5:48 PM, Martin Langhoff <martin.langhoff at gmail.com>
wrote:
> On Wed, Apr 9, 2008 at 2:53 PM, Samuel Klein <meta.sj at gmail.com> wrote:
> > It's nice to see a python toolchain for this (though I don't see any
> code at
> > that url?) They exist in other languages as well. We've been working
> with
> > Linterweb's Kiwix (kiwix.org) and the Schools-Wikipedia, which use their
> own
> > toolchains.
>
> Hi SJ
>
> I suspected that there would be something out there - Alecu's
> implementation has some interesting smarts in that it does an
> auto-selection of the pages to include. I'll let him explan that. The
> wikislice page talks about the user providing the list of urls, which
> means you need to auto-generate that somehow.
>
> Maybe we can integrate CDPedia's scoring scheme?
>
> [I did an svn checkout of kiwix, this thing has an embedded gecko.]
>
> > ps - I don't see code at the google-code url... and "cdpedia" is a name
> used
> > by a few existing projects, some commercial; you might want to choose
> > another name.
>
> Go to the code page, and click on the svn browse thingy...
>
> > pps - Martin: simple: is nice, but not of uniform quality
>
> Good to know! -- I wasn't ewxpecting too much uniform-ness out of
> wikipedia anyway ;-)
A pity (-: The Wikipedia 1.0 and schools-wikipedia projects are good at
uniformity, and can use support in new languages.
SJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080409/d7cf337a/attachment.html>
More information about the Devel
mailing list