[Wikireader] english wikireaders and 0.7

Samuel Klein sj at laptop.org
Sun Sep 7 16:05:33 EDT 2008


Andrew, what is that list for precisely?  Is it a list of good
revisions, or a list of the revisionid checked at the time and any
comments made?  We won't be able to apply comments by hand, but might
be able to use a specific revision rather than the latest one.

I checked a handful of the revs listed, and in general the latest
revision seems just as appropriate and usually more detailed or better
sourced.

@cjb and mad -- I'm compiling a list of mods; can you post the 2000
articles between 8k and 10k?

I'm both creating a tiny blacklist and adding some articles clearly
needed to complete various sets (which must have fallen a bit
above/below the cutoff).   Note for the future : we need to have
categories/groups with aggregate priorities, and quantized priorities
within each category, so that a selection can be fit into a given size
without including 88 of 90 elements or 8 of 10 world cups.

SJ



On Sun, Sep 7, 2008 at 2:26 AM, Andrew Cates <Andrew at soschildren.org> wrote:
> Samuel,
>
> Some of the volunteers listed their comments online like this:
> http://en.wikipedia.org/w/index.php?title=Wikipedia%3AWikipedia_CD_Selection%2Fadditions_and_updates&diff=212207029&oldid=211039948
>
> But although we took out vandalism when it was still visible in the
> current versions we did not generally remove adult stuff so other
> editoral changes to the main WP. The database contains a "section"
> delete and "string delete" specific to each article as well as the
> generic ones (so 1800 births are not taken out but 1985 births are).
> In a few cases where the script was hard to do I actually did do the
> changes and then self reverted, so there is a child friendly version
> in the edit histoty, mainly year pages and serial killers (e.g.
> http://en.wikipedia.org/w/index.php?title=1998&diff=219094725&oldid=219094618)
> but for things like sysmatic section removal it was not done on WP.
> Births is not taken out in this revision because the script will catch
> it.
>
> This needless to say is a lot of work...
>
> Andrew
> =============================
>
> On Sat, Sep 6, 2008 at 9:21 PM, Samuel Klein <sj at laptop.org> wrote:
>> That's great, thank you Andrew.  do you post these changers back to wp
>> proper?    I'd like for every article revision we include in our
>> bundle to have a permalink online.  (and it makes sense to me that
>> some other people who currently only read wp might like your versions
>> as well...)
>>
>> I will certainly support you in running an SOS-bot that publishes its
>> preferred cleaner revisions to articles, with an edit summary
>> indicating it is posting the version from the latest
>> childrens-wikipedia, and a bot-option to self-revert and leave a
>> message on the talk page (if editors start to get annoyed with it --
>> that way the regulars on any given article can choose to include or
>> not include its changes, but it doesn't change the latest-current
>> version and start what may already be ongoing edit wars).
>>
>> SJ
>>
>> (You know the content-review is overseen by a  Wikipedian when... it
>> includes cleaning out 'births' since 1980 and 'trivia' sections in
>> bios.  :-)
>>
>> On Sat, Sep 6, 2008 at 2:22 AM, Andrew Cates <Andrew at soschildren.org> wrote:
>>> Hi Samuel
>>>
>>> Just to be clear, we have finished checking our 5400 articles for
>>> vandalism etc and have this list. But as well as choosing versions we
>>> have a cleanup script which removes unsuitable paragraphs within
>>> articles, and editorial notices (e.g. empty sections, "see also" to
>>> articles not on the list, the sections labelled "personal life" in
>>> biographies which tends to be full of speculation about sexual
>>> orientation, the "births" section in years post 1980 which is full of
>>> rubbish, topic boxes where most of them are not included, category
>>> lists from portal pages, editorial notices where the issue is minor
>>> etc.). The remaining two weeks work is on the script not on finding
>>> the versions.
>>>
>>> The "near current" state of play is at
>>> http://schools-wikipedia-test.soschildren.org/wp/index/subject.htm
>>> which is only a week old.
>>>
>>> Andrew
>>>
>>> On Fri, Sep 5, 2008 at 6:46 PM, Samuel Klein <sj at laptop.org> wrote:
>>>> Thanks for the update.  bozmo, it's great to hear your group is
>>>> working on assessments as well... we won't be able to wait another two
>>>> weeks for a revised version list, but may be able to recompile once
>>>> next week.  However, I think for olpc's coming release we want a final
>>>> draft bundle this weekend.
>>>>
>>>> Warmly,
>>>> SJ
>>>>
>>>> On Thu, Sep 4, 2008 at 5:01 PM, Martin Walker <walkerma at potsdam.edu> wrote:
>>>>> We found a bug in the SelectionBot script that was affecting some unassessed
>>>>> articles.  That has now been fixed, and there is now an updated set of
>>>>> results, with about 28,000 articles selected.
>>>>>
>>>>> http://toolserver.org/~cbm/release-data/2008-9-4/HTML/index.html
>>>>>
>>>>>
>>>>> As for the small detailed fixes, we'll have to work on those at the weekend.
>>>>>
>>>>> Martin
>>>>> Walkerma on Wikipedia
>>>>>
>>>>> Samuel Klein wrote:
>>>>>>
>>>>>> ok, let's meet friday at 1500 EST  on #kiwix on freenode,
>>>>>> for those who can make it, to discuss making a main page for an english
>>>>>> 0.7 wikipedia bundle.
>>>>>>
>>>>>> SJ
>>>>>>
>>>>>> On Thu, Aug 28, 2008 at 12:20 PM, Martin Pascal <pmartin at linterweb.com
>>>>>> <mailto:pmartin at linterweb.com>> wrote:
>>>>>>
>>>>>>    Yes Sj ,
>>>>>>
>>>>>>    you could join #kiwix on irc.freenode.net <http://irc.freenode.net>
>>>>>>    Cordialement
>>>>>>    Martin Pascal
>>>>>>    tel : 02 32 40 23 69, fax : 02 32 61 45 26
>>>>>>    gsm : 06 13 89 77 32
>>>>>>    ----- Original Message ----- From: "Martin Walker"
>>>>>>    <walkerma at potsdam.edu <mailto:walkerma at potsdam.edu>>
>>>>>>
>>>>>>    To: "Samuel Klein" <sj at laptop.org <mailto:sj at laptop.org>>
>>>>>>    Cc: "Madeleine Ball" <mad at printf.net <mailto:mad at printf.net>>;
>>>>>>    "Offline Wikireaders" <wikireader at lists.laptop.org
>>>>>>    <mailto:wikireader at lists.laptop.org>>
>>>>>>    Sent: Thursday, August 28, 2008 6:16 PM
>>>>>>
>>>>>>    Subject: Re: [Wikireader] english wikireaders and 0.7
>>>>>>
>>>>>>
>>>>>>        SJ,
>>>>>>
>>>>>>        I can manage an IRC meeting on Friday - say at 3pm EDT (1900h
>>>>>>        UTC)?  If
>>>>>>        this is difficult for others, I will be around next week.  We
>>>>>>        have the
>>>>>>        #wikipedia-1.0 channel ( irc://irc.freenode.net/#wikipedia-1.0
>>>>>>        <http://irc.freenode.net/#wikipedia-1.0> ) if you
>>>>>>        wish, but perhaps you have a wikireader channel that may be more
>>>>>>        appropriate?
>>>>>>
>>>>>>        Martin
>>>>>>
>>>>>>
>>>>>>        Samuel Klein wrote:
>>>>>>
>>>>>>            @martin -- How about having a Friday afternoon wikireader
>>>>>>            meeting?
>>>>>>            For this week, whether or not we meet, a pressing question
>>>>>>            is :
>>>>>>            Generating the main page.  For the spanish WP, Madeleine
>>>>>>            did most of
>>>>>>            the main page by hand with a bit of help.  We may have to
>>>>>>            do the same
>>>>>>            here until better scripts are set up.
>>>>>>
>>>>>>            A couple people built the main page for our
>>>>>>            spanish-language bundle
>>>>>>            more or less by hand from a portal template.
>>>>>>
>>>>>>            Metadata :
>>>>>>
>>>>>>            1. metadata that is currently particularly useful for us is:
>>>>>>             - a blacklist of article titles, and a blacklist of
>>>>>>            images, for the
>>>>>>            very few that we explicitly leave out despite other metadata
>>>>>>             - a whitelist of both, again to ensure inclusion.
>>>>>>
>>>>>>            2. In a general system, I'd like to see this tagged with
>>>>>>            the name of
>>>>>>            the group associated; say olpc-peru-blacklist and
>>>>>>            olpc-peru-whitelist.
>>>>>>
>>>>>>            @cfabian -- testing this on bee units sounds like a fun
>>>>>>            test of the
>>>>>>            metadata slimming!
>>>>>>
>>>>>>            SJ
>>>>>>
>>>>>>            ps - any news from the offline spanish wp project that got
>>>>>>            started a
>>>>>>            while back?
>>>>>>
>>>>>>
>>>>>>            On Sun, Aug 24, 2008 at 6:12 PM, Martin Walker
>>>>>>            <walkerma at potsdam.edu <mailto:walkerma at potsdam.edu>
>>>>>>            <mailto:walkerma at potsdam.edu
>>>>>>            <mailto:walkerma at potsdam.edu>>> wrote:
>>>>>>
>>>>>>               Things are looking very promising for the Version 0.7
>>>>>>            selection -
>>>>>>               we should have a complete article list within a week or so,
>>>>>>               containing about 30,000 articles organized by a
>>>>>>            combination of
>>>>>>               quality and importance.  With our basic system of
>>>>>>            compression ,
>>>>>>               using I think probably Zeno format), I believe we
>>>>>>            should be able
>>>>>>               to include 30,000 long-ish articles with thumbnails on
>>>>>>            one DVD,
>>>>>>               along with Kiwix and some index pages.  I'd be
>>>>>>            interested to see
>>>>>>               how it would work with your compression system - we
>>>>>>            could get a
>>>>>>               few people to test that, I think.
>>>>>>
>>>>>>               I know how you love metadata, SJ, and we now have loads
>>>>>>            of it
>>>>>>               (from 1.4 million articles) - so we can customize the
>>>>>>            selection
>>>>>>               for you at will using quality, wikiproject, or the four
>>>>>>            importance
>>>>>>               paramaters.  Since this is for kids in specific places,
>>>>>>            we can
>>>>>>               emphasize dinosaurs or birds, exclude serial killers,
>>>>>>            or include
>>>>>>               all articles from (say) Uganda, all as requested.  Let
>>>>>>            me know if
>>>>>>               this feature is useful.  We don't have an equivalent
>>>>>>            ranking for
>>>>>>               images, I'm afraid - for V0.7 we just include all legal
>>>>>>            images (as
>>>>>>               thumbnails).  As for a "main page", the plan is to have
>>>>>>            a set of
>>>>>>               index pages generated by bot and then corrected by a manual
>>>>>>               "reality check", but that will take another month or two.
>>>>>>
>>>>>>               I'd really like to make sure that we make sure we work
>>>>>>            together in
>>>>>>               the coming months, because I think we can avoid a lot
>>>>>>            of duplicate
>>>>>>               work if we share our best resources, scripts, etc.
>>>>>>             Once the
>>>>>>               selection is done (~ 1st Sept), should we hold an IRC
>>>>>>            discussion
>>>>>>               on how we can best collaborate?
>>>>>>
>>>>>>               Martin
>>>>>>
>>>>>>
>>>>>>               Samuel Klein wrote:
>>>>>>
>>>>>>                   There's lots of motivation to get an english
>>>>>>            wikireader, say,
>>>>>>                   taking advantage of the article selection and
>>>>>>            processing of 0.7 .
>>>>>>                   OLPC could include this in the upcoming G1G1
>>>>>>            machines this
>>>>>>                   winter / early next year.  Other users could test
>>>>>>            wikireaders
>>>>>>                   that read this zipped format on their own machines,
>>>>>>            which
>>>>>>                   would flesh out the reader code.
>>>>>>
>>>>>>                   Martin -- what's the status on the 0.7 articlelist?
>>>>>>             Do you
>>>>>>                   have a similar imagelist that ranks images by
>>>>>>            importance to
>>>>>>                   that set of articles?
>>>>>>                   How is work on a 0.7 main page?  I'd love to see
>>>>>>            how large a
>>>>>>                   snapshot is with our curent wikireader code
>>>>>>            (without even
>>>>>>                   moving to 7z, or trimming the list).
>>>>>>
>>>>>>                   SJ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>        _______________________________________________
>>>>>>        Wikireader mailing list
>>>>>>        Wikireader at lists.laptop.org <mailto:Wikireader at lists.laptop.org>
>>>>>>        http://lists.laptop.org/listinfo/wikireader
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Wikireader mailing list
>>>> Wikireader at lists.laptop.org
>>>> http://lists.laptop.org/listinfo/wikireader
>>>>
>>>
>> _______________________________________________
>> Wikireader mailing list
>> Wikireader at lists.laptop.org
>> http://lists.laptop.org/listinfo/wikireader
>>
>


More information about the Wikireader mailing list