Hi All --<div><br></div><div>I&#39;d like to introduce Evren who is thinking about adding a Gears patch/plugin to media wiki.  Are any of you aware of an existing project doing this?</div><div><br></div><div>Thanks,</div><div>

Ben<br><br><div class="gmail_quote">On Mon, Feb 23, 2009 at 5:53 PM, Erik Moeller <span dir="ltr">&lt;<a href="mailto:erik@wikimedia.org">erik@wikimedia.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I&#39;m not really aware of the ad hoc mirroring / torrenting that&#39;s been<br>

happening; I would suggest checking in with Brion &amp; Gregory who should<br>

have the relevant pointers.<br>

<div><div></div><div class="Wj3C7c"><br>

Erik<br>

<br>

2009/2/23 Samuel Klein &lt;<a href="http://meta.sj" target="_blank">meta.sj</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>&gt;:<br>

&gt; Erik,<br>

&gt;<br>

&gt; Thank you for the update.  Once video (if not audio) become a bigger<br>

&gt; deal it may be helpful to separate media dumps by mediatype as well.<br>

&gt;<br>

&gt; If you know of any older torrents out there, that would be handy.<br>

&gt;<br>

&gt; SJ<br>

&gt;<br>

&gt; On Mon, Feb 23, 2009 at 8:39 PM, Erik Moeller &lt;<a href="mailto:erik@wikimedia.org">erik@wikimedia.org</a>&gt; wrote:<br>

&gt;&gt; Not particularly; we&#39;ve already got it on our radar and will make it<br>

&gt;&gt; happen, and it&#39;s not something that can be easily &quot;volunteerized&quot;<br>

&gt;&gt; (except by looking at the dumper code that already exists and<br>

&gt;&gt; improving it). Full history dumps are first priority, but full Commons<br>

&gt;&gt; and other media dumps (we&#39;ve talked to Archive.org about setting those<br>

&gt;&gt; up) are definitely targeted as well.<br>

&gt;&gt;<br>

&gt;&gt; Erik<br>

&gt;&gt;<br>

&gt;&gt; 2009/2/23 Samuel Klein &lt;<a href="http://meta.sj" target="_blank">meta.sj</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>&gt;:<br>

&gt;&gt;&gt; We&#39;re talking about dumps again on foundation-l -- and I really would<br>

&gt;&gt;&gt; like to see better dumps available, of commons in particular.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Erik, any advice you can offer in this regard -- can we help move<br>

&gt;&gt;&gt; things along in some way?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Sj<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Tue, Feb 26, 2008 at 3:01 PM, Erik Zachte &lt;<a href="mailto:erikzachte@infodisiac.com">erikzachte@infodisiac.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; Something that can process wikimedia&#39;s XML dumps, and then crawl<br>

&gt;&gt;&gt;&gt;&gt; wikipedia for the associated pictures. The picture dumps are tarred<br>

&gt;&gt;&gt;&gt;&gt; (why aren&#39;t they rsync&#39;able ?) and completely (for me) unmanageable.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; I may have some perl code that can be helpful.<br>

&gt;&gt;&gt;&gt; I won&#39;t have much time for this project right now but uploading and<br>

&gt;&gt;&gt;&gt; documenting some perl, I can do that.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; I have code that harvests all images for a given Wikipedia dump, by<br>

&gt;&gt;&gt;&gt; downloading them one by one.<br>

&gt;&gt;&gt;&gt; With a one sec interval between downloads (to avoid red flags at Wikimedia)<br>

&gt;&gt;&gt;&gt; this takes many weeks to be sure, but images are preserved for next run (on<br>

&gt;&gt;&gt;&gt; a huge disk).<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; It is part of WikiToTome(Raider) perl scripts, but it can be isolated.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; The script determines the url from the filename like the Wikimedia parser<br>

&gt;&gt;&gt;&gt; does (subfolder names are derived from first positions of md5 hash for<br>

&gt;&gt;&gt;&gt; filename).<br>

&gt;&gt;&gt;&gt; Then tries to download the image from the Wikipedia site, if not found tries<br>

&gt;&gt;&gt;&gt; again at Commons. (if both have it local version takes precedence).<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Images are stored in a similar nested folder structure as on Wikipedia<br>

&gt;&gt;&gt;&gt; server.<br>

&gt;&gt;&gt;&gt; On subsequent runs only missing images are downloaded (updated images are<br>

&gt;&gt;&gt;&gt; missed).<br>

&gt;&gt;&gt;&gt; Meta data in image are removed (my target platform is a Palm/Pocket PC with<br>

&gt;&gt;&gt;&gt; 4 Gb only).<br>

&gt;&gt;&gt;&gt; Images are resized as smartly as possible:<br>

&gt;&gt;&gt;&gt; jpg&#39;s above a certain size are resized to a certain size, compression rate<br>

&gt;&gt;&gt;&gt; is adjusted,<br>

&gt;&gt;&gt;&gt; png&#39;s similar except when their compression ratio is better than factor x,<br>

&gt;&gt;&gt;&gt; in which case they probably are charts or diagrams with text which would<br>

&gt;&gt;&gt;&gt; become unreadable when downsized.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; In 2005 I generated the last English Wikipedia for handhelds with images,<br>

&gt;&gt;&gt;&gt; about 2Gb was needed for 317,000 images of 240 pixels max height/width,<br>

&gt;&gt;&gt;&gt; without ugly compression artifacts, and quite few larger png&#39;s at original<br>

&gt;&gt;&gt;&gt; size.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; For offline usage 10 Mb images are rather over the top, and waste a lot of<br>

&gt;&gt;&gt;&gt; bandwidth.<br>

&gt;&gt;&gt;&gt; Actually I would favor a solution where images are collected and resized on<br>

&gt;&gt;&gt;&gt; a Wikimedia server, then put in a tar of only a few Gb.<br>

&gt;&gt;&gt;&gt; Technically I can do this. Time wise, is another matter for just now, but<br>

&gt;&gt;&gt;&gt; that might change.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; I have also code to generate png&#39;s from &lt;math&gt;..&lt;/math&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Erik Zachte<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt; Wikireader mailing list<br>

&gt;&gt;&gt; <a href="mailto:Wikireader@lists.laptop.org">Wikireader@lists.laptop.org</a><br>

&gt;&gt;&gt; <a href="http://lists.laptop.org/listinfo/wikireader" target="_blank">http://lists.laptop.org/listinfo/wikireader</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; --<br>

&gt;&gt; Erik Möller<br>

&gt;&gt; Deputy Director, Wikimedia Foundation<br>

&gt;&gt;<br>

&gt;&gt; Support Free Knowledge: <a href="http://wikimediafoundation.org/wiki/Donate" target="_blank">http://wikimediafoundation.org/wiki/Donate</a><br>

&gt;&gt;<br>

&gt;<br>

<br>

<br>

<br>

</div></div>--<br>

<div><div></div><div class="Wj3C7c">Erik Möller<br>

Deputy Director, Wikimedia Foundation<br>

<br>

Support Free Knowledge: <a href="http://wikimediafoundation.org/wiki/Donate" target="_blank">http://wikimediafoundation.org/wiki/Donate</a><br>

_______________________________________________<br>

Wikireader mailing list<br>

<a href="mailto:Wikireader@lists.laptop.org">Wikireader@lists.laptop.org</a><br>

<a href="http://lists.laptop.org/listinfo/wikireader" target="_blank">http://lists.laptop.org/listinfo/wikireader</a><br>

</div></div></blockquote></div><br></div>