[Wikireader] english wikireaders and 0.7

Samuel Klein sj at laptop.org
Sat Sep 6 16:21:23 EDT 2008


That's great, thank you Andrew.  do you post these changers back to wp
proper?    I'd like for every article revision we include in our
bundle to have a permalink online.  (and it makes sense to me that
some other people who currently only read wp might like your versions
as well...)

I will certainly support you in running an SOS-bot that publishes its
preferred cleaner revisions to articles, with an edit summary
indicating it is posting the version from the latest
childrens-wikipedia, and a bot-option to self-revert and leave a
message on the talk page (if editors start to get annoyed with it --
that way the regulars on any given article can choose to include or
not include its changes, but it doesn't change the latest-current
version and start what may already be ongoing edit wars).

SJ

(You know the content-review is overseen by a  Wikipedian when... it
includes cleaning out 'births' since 1980 and 'trivia' sections in
bios.  :-)

On Sat, Sep 6, 2008 at 2:22 AM, Andrew Cates <Andrew at soschildren.org> wrote:
> Hi Samuel
>
> Just to be clear, we have finished checking our 5400 articles for
> vandalism etc and have this list. But as well as choosing versions we
> have a cleanup script which removes unsuitable paragraphs within
> articles, and editorial notices (e.g. empty sections, "see also" to
> articles not on the list, the sections labelled "personal life" in
> biographies which tends to be full of speculation about sexual
> orientation, the "births" section in years post 1980 which is full of
> rubbish, topic boxes where most of them are not included, category
> lists from portal pages, editorial notices where the issue is minor
> etc.). The remaining two weeks work is on the script not on finding
> the versions.
>
> The "near current" state of play is at
> http://schools-wikipedia-test.soschildren.org/wp/index/subject.htm
> which is only a week old.
>
> Andrew
>
> On Fri, Sep 5, 2008 at 6:46 PM, Samuel Klein <sj at laptop.org> wrote:
>> Thanks for the update.  bozmo, it's great to hear your group is
>> working on assessments as well... we won't be able to wait another two
>> weeks for a revised version list, but may be able to recompile once
>> next week.  However, I think for olpc's coming release we want a final
>> draft bundle this weekend.
>>
>> Warmly,
>> SJ
>>
>> On Thu, Sep 4, 2008 at 5:01 PM, Martin Walker <walkerma at potsdam.edu> wrote:
>>> We found a bug in the SelectionBot script that was affecting some unassessed
>>> articles.  That has now been fixed, and there is now an updated set of
>>> results, with about 28,000 articles selected.
>>>
>>> http://toolserver.org/~cbm/release-data/2008-9-4/HTML/index.html
>>>
>>>
>>> As for the small detailed fixes, we'll have to work on those at the weekend.
>>>
>>> Martin
>>> Walkerma on Wikipedia
>>>
>>> Samuel Klein wrote:
>>>>
>>>> ok, let's meet friday at 1500 EST  on #kiwix on freenode,
>>>> for those who can make it, to discuss making a main page for an english
>>>> 0.7 wikipedia bundle.
>>>>
>>>> SJ
>>>>
>>>> On Thu, Aug 28, 2008 at 12:20 PM, Martin Pascal <pmartin at linterweb.com
>>>> <mailto:pmartin at linterweb.com>> wrote:
>>>>
>>>>    Yes Sj ,
>>>>
>>>>    you could join #kiwix on irc.freenode.net <http://irc.freenode.net>
>>>>    Cordialement
>>>>    Martin Pascal
>>>>    tel : 02 32 40 23 69, fax : 02 32 61 45 26
>>>>    gsm : 06 13 89 77 32
>>>>    ----- Original Message ----- From: "Martin Walker"
>>>>    <walkerma at potsdam.edu <mailto:walkerma at potsdam.edu>>
>>>>
>>>>    To: "Samuel Klein" <sj at laptop.org <mailto:sj at laptop.org>>
>>>>    Cc: "Madeleine Ball" <mad at printf.net <mailto:mad at printf.net>>;
>>>>    "Offline Wikireaders" <wikireader at lists.laptop.org
>>>>    <mailto:wikireader at lists.laptop.org>>
>>>>    Sent: Thursday, August 28, 2008 6:16 PM
>>>>
>>>>    Subject: Re: [Wikireader] english wikireaders and 0.7
>>>>
>>>>
>>>>        SJ,
>>>>
>>>>        I can manage an IRC meeting on Friday - say at 3pm EDT (1900h
>>>>        UTC)?  If
>>>>        this is difficult for others, I will be around next week.  We
>>>>        have the
>>>>        #wikipedia-1.0 channel ( irc://irc.freenode.net/#wikipedia-1.0
>>>>        <http://irc.freenode.net/#wikipedia-1.0> ) if you
>>>>        wish, but perhaps you have a wikireader channel that may be more
>>>>        appropriate?
>>>>
>>>>        Martin
>>>>
>>>>
>>>>        Samuel Klein wrote:
>>>>
>>>>            @martin -- How about having a Friday afternoon wikireader
>>>>            meeting?
>>>>            For this week, whether or not we meet, a pressing question
>>>>            is :
>>>>            Generating the main page.  For the spanish WP, Madeleine
>>>>            did most of
>>>>            the main page by hand with a bit of help.  We may have to
>>>>            do the same
>>>>            here until better scripts are set up.
>>>>
>>>>            A couple people built the main page for our
>>>>            spanish-language bundle
>>>>            more or less by hand from a portal template.
>>>>
>>>>            Metadata :
>>>>
>>>>            1. metadata that is currently particularly useful for us is:
>>>>             - a blacklist of article titles, and a blacklist of
>>>>            images, for the
>>>>            very few that we explicitly leave out despite other metadata
>>>>             - a whitelist of both, again to ensure inclusion.
>>>>
>>>>            2. In a general system, I'd like to see this tagged with
>>>>            the name of
>>>>            the group associated; say olpc-peru-blacklist and
>>>>            olpc-peru-whitelist.
>>>>
>>>>            @cfabian -- testing this on bee units sounds like a fun
>>>>            test of the
>>>>            metadata slimming!
>>>>
>>>>            SJ
>>>>
>>>>            ps - any news from the offline spanish wp project that got
>>>>            started a
>>>>            while back?
>>>>
>>>>
>>>>            On Sun, Aug 24, 2008 at 6:12 PM, Martin Walker
>>>>            <walkerma at potsdam.edu <mailto:walkerma at potsdam.edu>
>>>>            <mailto:walkerma at potsdam.edu
>>>>            <mailto:walkerma at potsdam.edu>>> wrote:
>>>>
>>>>               Things are looking very promising for the Version 0.7
>>>>            selection -
>>>>               we should have a complete article list within a week or so,
>>>>               containing about 30,000 articles organized by a
>>>>            combination of
>>>>               quality and importance.  With our basic system of
>>>>            compression ,
>>>>               using I think probably Zeno format), I believe we
>>>>            should be able
>>>>               to include 30,000 long-ish articles with thumbnails on
>>>>            one DVD,
>>>>               along with Kiwix and some index pages.  I'd be
>>>>            interested to see
>>>>               how it would work with your compression system - we
>>>>            could get a
>>>>               few people to test that, I think.
>>>>
>>>>               I know how you love metadata, SJ, and we now have loads
>>>>            of it
>>>>               (from 1.4 million articles) - so we can customize the
>>>>            selection
>>>>               for you at will using quality, wikiproject, or the four
>>>>            importance
>>>>               paramaters.  Since this is for kids in specific places,
>>>>            we can
>>>>               emphasize dinosaurs or birds, exclude serial killers,
>>>>            or include
>>>>               all articles from (say) Uganda, all as requested.  Let
>>>>            me know if
>>>>               this feature is useful.  We don't have an equivalent
>>>>            ranking for
>>>>               images, I'm afraid - for V0.7 we just include all legal
>>>>            images (as
>>>>               thumbnails).  As for a "main page", the plan is to have
>>>>            a set of
>>>>               index pages generated by bot and then corrected by a manual
>>>>               "reality check", but that will take another month or two.
>>>>
>>>>               I'd really like to make sure that we make sure we work
>>>>            together in
>>>>               the coming months, because I think we can avoid a lot
>>>>            of duplicate
>>>>               work if we share our best resources, scripts, etc.
>>>>             Once the
>>>>               selection is done (~ 1st Sept), should we hold an IRC
>>>>            discussion
>>>>               on how we can best collaborate?
>>>>
>>>>               Martin
>>>>
>>>>
>>>>               Samuel Klein wrote:
>>>>
>>>>                   There's lots of motivation to get an english
>>>>            wikireader, say,
>>>>                   taking advantage of the article selection and
>>>>            processing of 0.7 .
>>>>                   OLPC could include this in the upcoming G1G1
>>>>            machines this
>>>>                   winter / early next year.  Other users could test
>>>>            wikireaders
>>>>                   that read this zipped format on their own machines,
>>>>            which
>>>>                   would flesh out the reader code.
>>>>
>>>>                   Martin -- what's the status on the 0.7 articlelist?
>>>>             Do you
>>>>                   have a similar imagelist that ranks images by
>>>>            importance to
>>>>                   that set of articles?
>>>>                   How is work on a 0.7 main page?  I'd love to see
>>>>            how large a
>>>>                   snapshot is with our curent wikireader code
>>>>            (without even
>>>>                   moving to 7z, or trimming the list).
>>>>
>>>>                   SJ
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>        _______________________________________________
>>>>        Wikireader mailing list
>>>>        Wikireader at lists.laptop.org <mailto:Wikireader at lists.laptop.org>
>>>>        http://lists.laptop.org/listinfo/wikireader
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> Wikireader mailing list
>> Wikireader at lists.laptop.org
>> http://lists.laptop.org/listinfo/wikireader
>>
>


More information about the Wikireader mailing list