Hi All --<div><br></div><div>I'd like to introduce Evren who is thinking about adding a Gears patch/plugin to media wiki. Are any of you aware of an existing project doing this?</div><div><br></div><div>Thanks,</div><div>
Ben<br><br><div class="gmail_quote">On Mon, Feb 23, 2009 at 5:53 PM, Erik Moeller <span dir="ltr"><<a href="mailto:erik@wikimedia.org">erik@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I'm not really aware of the ad hoc mirroring / torrenting that's been<br>
happening; I would suggest checking in with Brion & Gregory who should<br>
have the relevant pointers.<br>
<div><div></div><div class="Wj3C7c"><br>
Erik<br>
<br>
2009/2/23 Samuel Klein <<a href="http://meta.sj" target="_blank">meta.sj</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>>:<br>
> Erik,<br>
><br>
> Thank you for the update. Once video (if not audio) become a bigger<br>
> deal it may be helpful to separate media dumps by mediatype as well.<br>
><br>
> If you know of any older torrents out there, that would be handy.<br>
><br>
> SJ<br>
><br>
> On Mon, Feb 23, 2009 at 8:39 PM, Erik Moeller <<a href="mailto:erik@wikimedia.org">erik@wikimedia.org</a>> wrote:<br>
>> Not particularly; we've already got it on our radar and will make it<br>
>> happen, and it's not something that can be easily "volunteerized"<br>
>> (except by looking at the dumper code that already exists and<br>
>> improving it). Full history dumps are first priority, but full Commons<br>
>> and other media dumps (we've talked to Archive.org about setting those<br>
>> up) are definitely targeted as well.<br>
>><br>
>> Erik<br>
>><br>
>> 2009/2/23 Samuel Klein <<a href="http://meta.sj" target="_blank">meta.sj</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>>:<br>
>>> We're talking about dumps again on foundation-l -- and I really would<br>
>>> like to see better dumps available, of commons in particular.<br>
>>><br>
>>> Erik, any advice you can offer in this regard -- can we help move<br>
>>> things along in some way?<br>
>>><br>
>>> Sj<br>
>>><br>
>>> On Tue, Feb 26, 2008 at 3:01 PM, Erik Zachte <<a href="mailto:erikzachte@infodisiac.com">erikzachte@infodisiac.com</a>> wrote:<br>
>>>>> Something that can process wikimedia's XML dumps, and then crawl<br>
>>>>> wikipedia for the associated pictures. The picture dumps are tarred<br>
>>>>> (why aren't they rsync'able ?) and completely (for me) unmanageable.<br>
>>>><br>
>>>> I may have some perl code that can be helpful.<br>
>>>> I won't have much time for this project right now but uploading and<br>
>>>> documenting some perl, I can do that.<br>
>>>><br>
>>>> I have code that harvests all images for a given Wikipedia dump, by<br>
>>>> downloading them one by one.<br>
>>>> With a one sec interval between downloads (to avoid red flags at Wikimedia)<br>
>>>> this takes many weeks to be sure, but images are preserved for next run (on<br>
>>>> a huge disk).<br>
>>>><br>
>>>> It is part of WikiToTome(Raider) perl scripts, but it can be isolated.<br>
>>>><br>
>>>> The script determines the url from the filename like the Wikimedia parser<br>
>>>> does (subfolder names are derived from first positions of md5 hash for<br>
>>>> filename).<br>
>>>> Then tries to download the image from the Wikipedia site, if not found tries<br>
>>>> again at Commons. (if both have it local version takes precedence).<br>
>>>><br>
>>>> Images are stored in a similar nested folder structure as on Wikipedia<br>
>>>> server.<br>
>>>> On subsequent runs only missing images are downloaded (updated images are<br>
>>>> missed).<br>
>>>> Meta data in image are removed (my target platform is a Palm/Pocket PC with<br>
>>>> 4 Gb only).<br>
>>>> Images are resized as smartly as possible:<br>
>>>> jpg's above a certain size are resized to a certain size, compression rate<br>
>>>> is adjusted,<br>
>>>> png's similar except when their compression ratio is better than factor x,<br>
>>>> in which case they probably are charts or diagrams with text which would<br>
>>>> become unreadable when downsized.<br>
>>>><br>
>>>> In 2005 I generated the last English Wikipedia for handhelds with images,<br>
>>>> about 2Gb was needed for 317,000 images of 240 pixels max height/width,<br>
>>>> without ugly compression artifacts, and quite few larger png's at original<br>
>>>> size.<br>
>>>><br>
>>>> For offline usage 10 Mb images are rather over the top, and waste a lot of<br>
>>>> bandwidth.<br>
>>>> Actually I would favor a solution where images are collected and resized on<br>
>>>> a Wikimedia server, then put in a tar of only a few Gb.<br>
>>>> Technically I can do this. Time wise, is another matter for just now, but<br>
>>>> that might change.<br>
>>>><br>
>>>> I have also code to generate png's from <math>..</math><br>
>>>><br>
>>>> Erik Zachte<br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>> _______________________________________________<br>
>>> Wikireader mailing list<br>
>>> <a href="mailto:Wikireader@lists.laptop.org">Wikireader@lists.laptop.org</a><br>
>>> <a href="http://lists.laptop.org/listinfo/wikireader" target="_blank">http://lists.laptop.org/listinfo/wikireader</a><br>
>>><br>
>><br>
>><br>
>><br>
>> --<br>
>> Erik Möller<br>
>> Deputy Director, Wikimedia Foundation<br>
>><br>
>> Support Free Knowledge: <a href="http://wikimediafoundation.org/wiki/Donate" target="_blank">http://wikimediafoundation.org/wiki/Donate</a><br>
>><br>
><br>
<br>
<br>
<br>
</div></div>--<br>
<div><div></div><div class="Wj3C7c">Erik Möller<br>
Deputy Director, Wikimedia Foundation<br>
<br>
Support Free Knowledge: <a href="http://wikimediafoundation.org/wiki/Donate" target="_blank">http://wikimediafoundation.org/wiki/Donate</a><br>
_______________________________________________<br>
Wikireader mailing list<br>
<a href="mailto:Wikireader@lists.laptop.org">Wikireader@lists.laptop.org</a><br>
<a href="http://lists.laptop.org/listinfo/wikireader" target="_blank">http://lists.laptop.org/listinfo/wikireader</a><br>
</div></div></blockquote></div><br></div>