[Wikireader] offline Wikipedia, Google Gears, & XO

Ben Lisbakken lisbakke at google.com
Tue Feb 24 14:28:23 EST 2009


Hi All --
I'd like to introduce Evren who is thinking about adding a Gears
patch/plugin to media wiki.  Are any of you aware of an existing project
doing this?

Thanks,
Ben

On Mon, Feb 23, 2009 at 5:53 PM, Erik Moeller <erik at wikimedia.org> wrote:

> I'm not really aware of the ad hoc mirroring / torrenting that's been
> happening; I would suggest checking in with Brion & Gregory who should
> have the relevant pointers.
>
> Erik
>
> 2009/2/23 Samuel Klein <meta.sj at gmail.com>:
> > Erik,
> >
> > Thank you for the update.  Once video (if not audio) become a bigger
> > deal it may be helpful to separate media dumps by mediatype as well.
> >
> > If you know of any older torrents out there, that would be handy.
> >
> > SJ
> >
> > On Mon, Feb 23, 2009 at 8:39 PM, Erik Moeller <erik at wikimedia.org>
> wrote:
> >> Not particularly; we've already got it on our radar and will make it
> >> happen, and it's not something that can be easily "volunteerized"
> >> (except by looking at the dumper code that already exists and
> >> improving it). Full history dumps are first priority, but full Commons
> >> and other media dumps (we've talked to Archive.org about setting those
> >> up) are definitely targeted as well.
> >>
> >> Erik
> >>
> >> 2009/2/23 Samuel Klein <meta.sj at gmail.com>:
> >>> We're talking about dumps again on foundation-l -- and I really would
> >>> like to see better dumps available, of commons in particular.
> >>>
> >>> Erik, any advice you can offer in this regard -- can we help move
> >>> things along in some way?
> >>>
> >>> Sj
> >>>
> >>> On Tue, Feb 26, 2008 at 3:01 PM, Erik Zachte <
> erikzachte at infodisiac.com> wrote:
> >>>>> Something that can process wikimedia's XML dumps, and then crawl
> >>>>> wikipedia for the associated pictures. The picture dumps are tarred
> >>>>> (why aren't they rsync'able ?) and completely (for me) unmanageable.
> >>>>
> >>>> I may have some perl code that can be helpful.
> >>>> I won't have much time for this project right now but uploading and
> >>>> documenting some perl, I can do that.
> >>>>
> >>>> I have code that harvests all images for a given Wikipedia dump, by
> >>>> downloading them one by one.
> >>>> With a one sec interval between downloads (to avoid red flags at
> Wikimedia)
> >>>> this takes many weeks to be sure, but images are preserved for next
> run (on
> >>>> a huge disk).
> >>>>
> >>>> It is part of WikiToTome(Raider) perl scripts, but it can be isolated.
> >>>>
> >>>> The script determines the url from the filename like the Wikimedia
> parser
> >>>> does (subfolder names are derived from first positions of md5 hash for
> >>>> filename).
> >>>> Then tries to download the image from the Wikipedia site, if not found
> tries
> >>>> again at Commons. (if both have it local version takes precedence).
> >>>>
> >>>> Images are stored in a similar nested folder structure as on Wikipedia
> >>>> server.
> >>>> On subsequent runs only missing images are downloaded (updated images
> are
> >>>> missed).
> >>>> Meta data in image are removed (my target platform is a Palm/Pocket PC
> with
> >>>> 4 Gb only).
> >>>> Images are resized as smartly as possible:
> >>>> jpg's above a certain size are resized to a certain size, compression
> rate
> >>>> is adjusted,
> >>>> png's similar except when their compression ratio is better than
> factor x,
> >>>> in which case they probably are charts or diagrams with text which
> would
> >>>> become unreadable when downsized.
> >>>>
> >>>> In 2005 I generated the last English Wikipedia for handhelds with
> images,
> >>>> about 2Gb was needed for 317,000 images of 240 pixels max
> height/width,
> >>>> without ugly compression artifacts, and quite few larger png's at
> original
> >>>> size.
> >>>>
> >>>> For offline usage 10 Mb images are rather over the top, and waste a
> lot of
> >>>> bandwidth.
> >>>> Actually I would favor a solution where images are collected and
> resized on
> >>>> a Wikimedia server, then put in a tar of only a few Gb.
> >>>> Technically I can do this. Time wise, is another matter for just now,
> but
> >>>> that might change.
> >>>>
> >>>> I have also code to generate png's from <math>..</math>
> >>>>
> >>>> Erik Zachte
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Wikireader mailing list
> >>> Wikireader at lists.laptop.org
> >>> http://lists.laptop.org/listinfo/wikireader
> >>>
> >>
> >>
> >>
> >> --
> >> Erik Möller
> >> Deputy Director, Wikimedia Foundation
> >>
> >> Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
> >>
> >
>
>
>
> --
> Erik Möller
> Deputy Director, Wikimedia Foundation
>
> Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
> _______________________________________________
> Wikireader mailing list
> Wikireader at lists.laptop.org
> http://lists.laptop.org/listinfo/wikireader
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/wikireader/attachments/20090224/8f7ad3a0/attachment.htm 


More information about the Wikireader mailing list