[Wikireader] english wikireaders and 0.7

Martin Walker walkerma at potsdam.edu
Wed Sep 3 15:24:47 EDT 2008


CBM has once again worked wonders and we now have a updated selection of 
27,000 articles for Version 0.7.  For Version 0.7 purposes, this is only 
a preliminary selection, because some details need to be resolved.  
There are one or two typos, and I suspect one project's "principal 
article" (on the Roma people) has been renamed in the last week, and the 
resultant redirect causes that project to receive a lower score than it 
should.  The listing can be found online here:
http://toolserver.org/~cbm/release-data/2008-9-3/HTML/index.html

However, the selection should be about >95% "there" now, probably >99%.  
The scores are computed based on importance (major) and quality 
(minor).  Most FAs will make it into the collection unless they are VERY 
obscure; a Start-Class article, on the other hand, needs to be on a 
pretty important topic to make it in.  Redirects are not listed, but 
they are included in article counts.

Importance is based on a logarithmic formula from no. of hits, no. of 
interwikis, and no. of links in, as well as WikiProject assessments, as 
described here:
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/SelectionBot
A new feature in this selection is that the WikiProject importance 
assessment is adjusted to allow for the scope and importance of the 
WikiProject itself; for example, a "Top Importance" rating from the 
project on The KLF would rank lower than from The Beatles, which would 
in turn rank lower than if it came from a project on Popular Music.

CBM provided me with four files (Madeleine now has these), I can send 
these to anyone who wants them:

  project-scores.xls       - computed scores.

  project-subprojects.txt  - child->parent mapping. If a project is 
                             listed here, the parent project's score is 
                             used as the child's score.

  Selected.xls             - spreadsheet of data for all selected 
                             articles. Each article can occur more than
                             once.

  Selected-unique.txt      - list of all selected articles, sorted by 
                             score, each article appears at most once. 


Martin
Walkerma on Wikipedia



More information about the Wikireader mailing list