OLPC Software Code Localization - A Few Things I've Noticed

Ed Trager ed.trager at gmail.com
Fri Oct 26 15:54:53 EDT 2007


Hi, everyone,

In response to Xavier Alvarez' request on 10/25 for translators and
coordinators, I decided to get off the sidelines and take a look at
OLPC's new Pootle-based L10N infrastructure.

Here are a few things I noticed which I think will be of general
interest and concern:

(0) CASING/NAMING OF PO FILES PROBLEM:

      (Upper/Lower) Casing of names of po files is inconsistent: For
example, in Core there is
      "journal-activity.Journal.po" with upper case "J" for the 2nd
occurrence of "Journal" but
      then why isn't "write.write.po" written "write.Write.po"?

      This is a small point, but consistent and inuitive naming of
these PO files will help
      everyone. Or am I just failing to understand or intuit what the
pattern is supposed to be
      here?

(1)  INCONSISTENT NUMBER OF MSGIDs ACROSS DIFFERENT LANGUAGES:

       The other day when I looked at write.write.po for French, there
were only 10
       messages in the catalog.  Today, I see that there are 36
messages which looks a
       lot closer to what I myself get from "xgettext toolbar.py" on
the latest code.
       However, when I
       checked write.write.po for Thai today, I see that it still has
only 10 messages.

       Solution (Or at least  A Question Posing As A Possible Solution):

       Does everyone agree that there needs to be a way that all of
the ".po" files for
       all languages get updated with the  latest messages  extracted
via "xgettext"
       from the latest codebase (toolbar.py, etc.)?

       What appears to be happening right now is perhaps that someone
decided to work on
       the French so maybe they ran "xgettext" against Write
Activity's latest "toolbar.py" and
       so for French we've now got 36 messages (not all translated --
in fact, Pootle says there
       are only 9 out of 58 translated and I have no clue where that
"58" is coming from
       because I only find 30 or so when I run "xgettext toolbar.py"
myself).  BUT, as nobody
       has yet worked on Thai, there are only 10 messages present for Thai.

       I suspect that it is overly optimistic to believe that the best and
       most willing translators out in the community will always
double-check by
       running "xgettext" themselves against the latest code to make
sure that messages
       are not missing.  So computer-assisted updating of the PO files
to contain the
       very latest set of msgids sounds like a necessary step.

(3)  SOFTWARE I18N/L10N REVIEW PROCESS:

       While beginning to translate write.write.po for Thai earlier
today, I got to this set of
       msgids:

          #: toolbar.py:543
          msgid "Lower Case List"

          #: toolbar.py:544
          msgid "Upper Case List"

       These are two msgids from a dropdown list which also includes
"Numbered List" and
       "Bulleted List". The first  item thus refers to a list
enumerated with lower case
       Latin letters:

            a. Item One
            b. Item Two
            c. Item Three

       ... while the second obviously refers to enumerating a list
with A, B, C, etc.

       Do we all recognize what the problem is here?  OK, I'm waiting
for your answers :-).
       Yes, it also took me half a second to recognize the problem too!

       Using Thailand as an example, it is true that Thais will
sometimes enumerate lists
       using Latin upper or lower case letters.  But the norm in a
Thai document (when one's
       keyboard in any case is already set to Thai) is to enumerate
lists using Thai letters,
       ก, ข, ค, . . . etc.

       And of course in the Arabic speaking world it is common to
enumerate lists using
       ت,  ب,  ا , (aleph, be, te, etc. ... )

       And when not enumerating by letters, it is even more common to
enumerate by digits,
       which of course must include native Thai, native Arabic, and a
host of other native
       numbering systems for other languages and scripts.

       So, in addition to :

           msgid "Numbered List"
           msgid "Lower Case List"
           msgid "Upper Case List"

        ... we really need to add:

                msgid "Arabic Numbered List"
                ...
                msgid "Devanagari Numbered List"
                ...
                msgid "Thai Numbered List"
                ... etc ...

        If memory serves me, I believe there may be on the order of 17
or so different native digit sets currently in Unicode.

        And of course, there need to be msgids for enumerating lists
using different,
        alphabets:

               msgid "Arabic List"
               msgid "Devanagari List"
               msgid "Thai List"

        Of course we cannot have a drop-down list with hundreds of
different list styles.  That would be completely innappropriate for
school children.

       So my initial thought is that the Python code for the Write
application (and any application that requires a drop-down list with
different list styles -- i.e., probably break the whole thing out into
a separate reusable class) should include all of the different list
styles (getting those needed for the "green countries" would be an
excellent place to start).

Such a Python class should, by default at least, dynamically display
only those lists appropriate for the current locale.  These should be
displayed in the dropdown first -- that is, in an Arabic script
locale, we want to put "Arabic Numeric List" and "Arabic Alphabetical
List" ahead of "Numbered List" (using Arabic-Indic "Western" digits),
"Lower Case List",
and "Upper Case List" (the latter two being Latin letters).

Just my 2 cents ...

Best - Ed

On 10/25/07, Xavier Alvarez <xavi.alvarez at gmail.com> wrote:
> Hi!
>
> We seem to be getting the L10n effort under way (in a new server),
> and the subject pretty much sums up the situation:
>
> We need
> - translators (obviously),
> - coordinators (that can actually manage each language) and
> - volunteers (the universal glue?)
>
> All languages are welcome, but it should be noted that there's a
> need for those languages used in the 'green countries', which
> are: Amharic, Arabic, English, Spanish, French, Hausa, Hindi,
> Igbo, Nepali, Portuguese, Romanian, Russian, Kinyarwanda, Thai,
> Urdu, & Yoruba.
>
> I've updated the [[Localization]] page, noting that the previous
> workflow for submitting translations using tickets is being
> dropped (translations are going to be primarily on-line), making
> reference to the [[Pootle]] page (that still needs some
> club^H^H^H^H polishing) about its use. For future administrators
> the [[Pootle/Administration]] could be of interest.
>
>
> One of the nice features of Pootle is the ability to have
> glossaries that are used to propose translated terms dynamically
> in the web-gui (thus helping keep a homogeneous terminology). So
> I made rude first approach in [[Pootle/Glossary]] that could also
> do with some reviewing...
>
> If you are interested in participating, at the bottom of the
> [[Pootle]] page there's a table for signing up.
> Note: From previous mails, I took the liberty of signing up Khaled
> Hosny [ar], Simos Xenitellis [el] and Maxim Osipov [ru].
>
> Questions, suggestions, ideas, etc. are all welcome!
>
>
> Cheers,
> Xavier
>
> [[Localization]] http://wiki.laptop.org/go/Localization
> [[Pootle]] http://wiki.laptop.org/go/Pootle
> [[Pootle/Administration]]
> http://wiki.laptop.org/go/Pootle/Admininstration
> [[Pootle/Glossary]] http://wiki.laptop.org/go/Pootle/Glossary
>
> --
> XA
> =========
> Don't Panic!  The Answer is 42
> _______________________________________________
> Localization mailing list
> Localization at lists.laptop.org
> http://lists.laptop.org/listinfo/localization
>


More information about the Devel mailing list