OLPC Software Code Localization - A Few Things I've Noticed

Xavier Alvarez xavi.alvarez at gmail.com
Fri Oct 26 20:50:41 EDT 2007


On Friday 26 October 2007 16:54, you wrote:
ET> Hi, everyone,
ET>
ET> In response to Xavier Alvarez' request on 10/25 for
ET> translators and coordinators, I decided to get off the
ET> sidelines and take a look at OLPC's new Pootle-based L10N
ET> infrastructure.
ET>
ET> Here are a few things I noticed which I think will be of
ET> general interest and concern:
ET>
ET> (0) CASING/NAMING OF PO FILES PROBLEM:

The 'rule' is quite simple (but not necessarily as intuitive as 
may be expected): given that we are bundling several d.l.o 
projects into pootle-projects, we need to ensure (or at least 
minimize the possibility) of having 2 POT files with the same 
name.

Solution? We prefix whatever filename used for the POT in d.l.o 
with the name of its project...

journal-activity.Journal.po
<--dlo-project->.<filename>

Thus, any 'inconsistencies' are really product of other 
inconsistencies... they just happen to be more evident (and ugly) 
within Pootle.

ET>
ET>       (Upper/Lower) Casing of names of po files is
ET> inconsistent: For example, in Core there is
ET> "journal-activity.Journal.po" with upper case "J" for
ET> the 2nd occurrence of "Journal" but then why isn't
ET> "write.write.po" written "write.Write.po"? 
ET>
ET>       This is a small point, but consistent and inuitive
ET> naming of these PO files will help everyone. Or am I just
ET> failing to understand or intuit what the pattern is supposed
ET> to be here?
ET>
ET> (1)  INCONSISTENT NUMBER OF MSGIDs ACROSS DIFFERENT
ET> LANGUAGES: 

Yes and no.

The numbers shown in the statistics do not represent quantity of 
MSGIDs but WORDS in the file. So I presume that for untranslated 
strings it takes the MSGID words, and for translated strings, the 
MSGSTR. Thus two languages with all things translated and upto 
date, may still show different numbers (although conceptually 
they are the same). BTW, it does show the number of strings in 
other 'statistic levels'.

Yes, I was quite baffled too... translators are more worried about 
the word-count than 'lines of code'... ;)

In http://solar.laptop.org:5080/projects/xo_core/
Language				Trans.	Fuzzy	Untrans. Total
Portuguese (Brazil) 	162	42%	4	1%	213	56%	379
Spanish 				219	62%	0	0%	132	37%	351

While in each language+project
[pt_BR]	8 files, 162/379 words (42%) translated [118/247 strings]
[es]		8 files, 219/351 words (62%) translated [157/234 strings]

Note that even Still, there's a difference with the number of 
strings... see below.


ET>
ET>        The other day when I looked at write.write.po for
ET> French, there were only 10 messages in the catalog.  Today, I
ET> see that there are 36 messages which looks a lot closer to
ET> what I myself get from "xgettext toolbar.py" on the latest
ET> code.
ET>        However, when I checked write.write.po for Thai today,
ET> I see that it still has only 10 messages.
ET>
ET>        Solution (Or at least  A Question Posing As A Possible
ET> Solution): 
ET>
ET>        Does everyone agree that there needs to be a way that
ET> all of the ".po" files for all languages get updated with the 
ET> latest messages extracted via "xgettext" from the latest
ET> codebase (toolbar.py, etc.)?

Yes, there's a problem. Reviewing what you've noted, the problem 
appears to be a mix of things. Just for the record, we are 
sticking to the POT files found in d.l.o git (not fedora)

1) the POT in dlo only has 9 strings
http://dev.laptop.org/git?p=projects/write;a=blob_plain;f=po/write.pot;hb=HEAD

2) the POT creation dates have probably been tampered with 
externally so it's impossible to determine which one makes sense 
without going into the source code:
FR.PO   "POT-Creation-Date: 2007-06-21 17:33+0200\n"
DLO POT "POT-Creation-Date: 2007-06-21 17:33+0200\n"

I personally believe that developers should generate the POT file 
and make sure that it's in d.l.o git. 


Overall, I find these inconsistencies a direct result of the messy 
flow we've had with t.fp.o. As a matter of fact, I've been trying 
to process the tickets in d.l.o holding PO submissions and things 
haven't been very nice. The current situation is:

0) only some projects have been injected into Pootle
   (core and bundled activites, with few exceptions like Etoys)
1) d.l.o POT files are being considered the standard
2) d.l.o PO files have been injected but not fully verified
2.1) many have lost their (UTF-8) encoding
2.2) many PO files seem not to correspond to their POT (1)
3) tickets (submitting PO files) seem to issues noted in (2)

On top, some of the quirks and particularities of the tools do 
seem to get in the way, but I think that most stem from the fact 
that we don't have a 'base' POT population.



Still working on it,
Xavier

PS: The issue regarding lists is an interesting issue that I think 
it may be much broader than the XO... :)

...snip...
ET> >
ET> > Questions, suggestions, ideas, etc. are all welcome!
ET> >
ET> >
ET> > Cheers,
ET> > Xavier
ET> >
ET> > [[Localization]] http://wiki.laptop.org/go/Localization
ET> > [[Pootle]] http://wiki.laptop.org/go/Pootle
ET> > [[Pootle/Administration]]
ET> > http://wiki.laptop.org/go/Pootle/Admininstration
ET> > [[Pootle/Glossary]]  
http://wiki.laptop.org/go/Pootle/Glossary 

-- 
XA
=========
Don't Panic!  The Answer is 42



More information about the Devel mailing list