[Wikireader] [Techteam] Fw: Reducir tamaño Wikipedia en XS

Samuel Klein sj at laptop.org
Tue Feb 10 19:46:28 EST 2009


I seem to remember liking this technical decision of zeno.  I'd see a
fallback working in a couple of ways
 -- according to host-machine (links look for local then server then
global pages)
 -- according to language (if a cross-language link, ay: and qu:
should fall back to es: with perhaps a js warning)
 -- according to topic (this could be a very large global redirect
list; first check to see if the page exists, then check the redirect
table)

The last two dimensions of fallback are info that could be hosted
globally on a wikimedia site...

SJ

On Tue, Feb 10, 2009 at 8:13 AM, Fabien Coulon <fabien.coulon at gmail.com> wrote:
> Thanks, we will work on a test version.
> Samuel, about cross linking between corpus, the zeno:// urls make it
> possible, though there is no fallback for missing corpus yet. A page in a
> missing corpus is a blank page :) It's something to work on.
>
> On Mon, Feb 9, 2009 at 9:12 PM, Chris Ball <cjb at laptop.org> wrote:
>>
>> Hi Fabien, SJ,
>>
>> > Ok. But the zeno with all articles from wikipedia 'es' takes about 1GB,
>> > just
>> > for texts. Does the Peruvian selection consist of all articles once
>> > removed
>> > those in http://dev.laptop.org/~cjb/eswiki/blacklist3 ?
>>
>> The status of our eswiki builds is:
>>
>> build 1: for XO, 80M for most popular 30k articles, plus 20M for 3000
>> images
>> build 2: for server, 622M for all 1M articles, plus 230M for 220k images
>>
>> (the images in both cases are downsampled to lower quality so that we
>> can include more of them)
>>
>> Fabien, if you'd like to try out the full build 2, here are instructions
>> that should work on a 32-bit x86 Linux machine:
>>
>> * wget http://dev.laptop.org/~cjb/spanish_wikiserver_full.tgz
>> * tar zxf spanish_wikiserver_full.tgz
>> * cd wikiserver/es_PE
>> * wget http://dev.laptop.org/~cjb/eswiki/images.tar.gz
>> * tar zxf images.tar.gz
>> * cd ..
>> * python server.py es_PE/eswiki-20090124-pages-articles.xml.bz2 8000
>> * browse to http://<IP address>:8000/
>>
>> Thanks,
>>
>> - Chris.
>> --
>> Chris Ball   <cjb at laptop.org>
>
>


More information about the Wikireader mailing list