[Wikireader] english wikireaders and 0.7
Martin Walker
walkerma at potsdam.edu
Wed Sep 3 15:24:47 EDT 2008
CBM has once again worked wonders and we now have a updated selection of
27,000 articles for Version 0.7. For Version 0.7 purposes, this is only
a preliminary selection, because some details need to be resolved.
There are one or two typos, and I suspect one project's "principal
article" (on the Roma people) has been renamed in the last week, and the
resultant redirect causes that project to receive a lower score than it
should. The listing can be found online here:
http://toolserver.org/~cbm/release-data/2008-9-3/HTML/index.html
However, the selection should be about >95% "there" now, probably >99%.
The scores are computed based on importance (major) and quality
(minor). Most FAs will make it into the collection unless they are VERY
obscure; a Start-Class article, on the other hand, needs to be on a
pretty important topic to make it in. Redirects are not listed, but
they are included in article counts.
Importance is based on a logarithmic formula from no. of hits, no. of
interwikis, and no. of links in, as well as WikiProject assessments, as
described here:
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/SelectionBot
A new feature in this selection is that the WikiProject importance
assessment is adjusted to allow for the scope and importance of the
WikiProject itself; for example, a "Top Importance" rating from the
project on The KLF would rank lower than from The Beatles, which would
in turn rank lower than if it came from a project on Popular Music.
CBM provided me with four files (Madeleine now has these), I can send
these to anyone who wants them:
project-scores.xls - computed scores.
project-subprojects.txt - child->parent mapping. If a project is
listed here, the parent project's score is
used as the child's score.
Selected.xls - spreadsheet of data for all selected
articles. Each article can occur more than
once.
Selected-unique.txt - list of all selected articles, sorted by
score, each article appears at most once.
Martin
Walkerma on Wikipedia
More information about the Wikireader
mailing list