[Wikireader] english wikireaders and 0.7

Chris Ball cjb at laptop.org
Sun Sep 7 19:02:49 EDT 2008


Hi SJ,

   > To Andrew -- thank you.  The 2% vandalism stat is very valuable!
   > CJB, would it be possible to grab revision ids from this page,
   > wherever there is a simple newline/title/oldid= ?

Possible, yeah, but I'm not sure it'll be the best use of the time I
have remaining to work on this once the work-week starts up again and I
get back to blockers for the release.  We'd have to switch over from the
"current versions" archive to the "all versions" archive, and then write
scripts to create a new archive with the versions we want.

   > Other replies inline: I am working on an article list here:

   >   http://en.wikipedia.org/wiki/User:Sj/en-g1g1#D

   > Pulling out articles I'd like to exclude at the start of each
   > section.

Cool, thanks.

   > Agreed.  It seems that removing extraneous references to Harry
   > Potter frees up another thousand articles or so...

Can't tell whether this was humor.  ;-)

   > en:wp articles tend to grow without shrinking.  Like you, I'm
   > worried about not having enough articles to make a valuable
   > reference work, especially in the sense of having a solid network
   > of internal links.  I also see in this snapshot a lot of articles
   > that are interesting but don't need to be nearly so detailed for
   > our audience (and may simply bore).

   > Can we try 6000 articles + 21000 ledes, to include every article in
   > Martin's list?

In principle, yeah, but like the revisions work it requires new work
for detecting leads and putting them into their own articles.  My
gut feeling is that this work just isn't important enough for this
particular snapshot where our users have access to the net if they
need it.  (Given time constraints.)

   > I'm also happy with making this larger than 100MB for g1g1, perhaps
   > even 150MB.  In the future our goal can be to expand coverage while
   > reducing size... with less time pressure.

Absolutely.

   > We definitely need a template blacklist again.  How about the top
   > 5000, excluding certain template categories?

Another 5000 (small) articles is going to have a big impact on disk
space, I think.  We'll see how it looks.

Oh, Mad reminded me that you wanted to see a list of the 2k articles
that are in the 10k slice and not the 8k slice.  Here it is:

   http://dev.laptop.org/~cjb/enwiki/8k-10k-diff

- Chris.
-- 
Chris Ball   <cjb at laptop.org>


More information about the Wikireader mailing list