[Server-devel] Wikipedia on XS

Chris Ball cjb at laptop.org
Tue Jul 27 13:29:03 EDT 2010


Hi,

I think Martin's answers are all correct.  Just to be clear:

   > Strange -- it does have images for me. Just tested -- from the
   > homepage, I click on the Sociology link and on that page I get
   > the picture of Auguste Compte. Both Wikipedia English and Spanish
   > carry it.
   >
   > Hmm. Or do they? The images are actually being referenced from
   > the internet. This is a regression from earlier releases of WA.

Yep.  We didn't make an image selection for the English snapshot, we
were already very article-starved trying to fit in 100MB, because the
English WP articles are on average longer than the Spanish ones.  (And
of course there are many times more of them.)

So, no images in the English version.  The Spanish version contains
some (3000) images, and will call out to the net if available for
the rest.

   > When I compare WA English with WA Spanish I see the Spanish one
   > contains a good number of images, where the English one doesn't.

Right.

   >> * The Wikipedia Activity links to articles not included in the
   >> activity.  The schools Wikipedia does not include any links to
   >> articles not included in the Schools Wikipedia

   > Correct -- they are usually marked with a different colour
   > however...

Right.  The links that aren't present in the archive are still shown,
so that you could click on them if you happen to be connected, and are
a different colour, so that you know which ones to avoid if you aren't
connected.

   >> * The Wikipedia Activity is very slow on an XO-1. The Schools
   >> Wikipedia is very fast with good wifi to an idle schoolserver

   > Wikipedia Activity is highly compressed, so it's unzipping things
   > behind your back.

Right.  (Actually, the slow parts are mainly converting from wiki markup
into HTML, and performing template expansion, which involves recursive
references to other articles.  A single article might end up pulling
fifty "articles" from the archive, forty-nine of them being templates.)

The Schools Wikipedia is already in HTML, so it doesn't have the
decompression stage, the template expansion stage, or the converting
to HTML stage.  It trades-off for using much much more disk space.

   >> * The Wikipedia Activity has a lot of articles that might be
   >> considered inappropriate for some or all age groups by some
   >> cultures. I haven't found anything in the Schools Wikipedia that
   >> I think might be considered inappropriate, but it's search
   >> features are poor and I haven't done a particularly thorough
   >> search.

   > There's been some editing in WA but I do believe you might find
   > tricky topics

Yes, we only removed articles on (specific) pornography and some sex
acts.  A deployment would have to decide to reduce the article set
farther themselves; we weren't comfortable limiting other types of
knowledge for everyone.  It's all available on the main Wikipedia
site in any case.

   > Given that your planned XS is fairly powerful for the task, and
   > has abundant storage, you have another alternative:
   > http://static.wikipedia.org/ -- it lacks search, images and
   > content curation, but it definitely has coverage.

Yes, there's definitely a different set of use cases involved between
shipping gigabytes of HTML on the school server, and shipping a small
copy of the most "interesting" articles on Wikipedia on every laptop,
standalone.  The main goal of the Wikipedia activities is the latter,
which explains the tradeoffs it makes.

- Chris.
-- 
Chris Ball   <cjb at laptop.org>
One Laptop Per Child


More information about the Server-devel mailing list