[Localization] Revamping the glossary

Sayamindu Dasgupta sayamindu at gmail.com
Mon Mar 3 12:32:25 EST 2008


On Fri, Feb 29, 2008 at 2:29 PM, Alexander Dupuy <alex.dupuy at mac.com> wrote:
> Sayamindu Dasgupta wrote:
>  > Thanks for the pointer to the translate.sourceforge.net page. I think
>  > that the translate project folks have already managed to achieve what
>  > I was trying to do.
>  > I have followed the steps outlined in that page - and I'm attaching
>  > the output. Does the attached file look more comprehensive ?
>
>  It's certainly more comprehensive, but it goes a bit far the other way,
>  including a lot of terms that only appear once, and stuff that not only
>  doesn't need to be in the Pootle Terminology, but that probably
>  shouldn't even be translated ("acos" to "tanh" function names, "1000" to
>  "10" numbers).  There's also a lot of stuff with different numbers
>  ("Preset 1" "Preset 2") which could easily be collapsed.
>

I agree that it goes a bit overboard :-).

>  Did you use the poglossary  bash script in translate-toolkit-1.1.0?  It
>  seems to automate the steps in their documentation, so that you can run
>  it as a single script rather than a whole sequence of commands.
>

Somehow the bash script was giving errors - but I ran the same set of
commands it would run.

>  # poglossary - takes a directory of PO files and creates a glossary
>  # out of them.  The glossary will only contain short phrases and is output
>  # in PO, TMX and CSV formats.
>
>  if [ $# -ne 3 ]; then
>         echo "Usage: `basename $0` language-iso-code input-dir project-name"
>         echo "  e.g. `basename $0` zu zu-po firefox"
>         exit 1
>  fi
>
>  However, this would appear to suffer from most of the same problems as
>  noted above, and a few others.  Running it on the Spanish PO files
>  downloaded from Pootle as packaging-es.zip pootle-es.zip update1-es.zip
>  xo_bundled-es.zip gives 787 msgids in 3621 lines (your version has 605
>  msgids in 2467 lines).  So it's not really a panacea, but does provide a
>  convenient starting point for simple modifications.
>
>  The word counting approach I described earlier is more complex, but
>  should provide better results (although it will require significantly
>  more coding).
>

Yeah - I'm thinking of filtering out the stopwords and doing a
comparison if I get any sensible results.
Also - do you think putting the glossary (the 605 string one) in the
wiki and asking people to point out unnecessary msgids would be of any
use ?


Thanks,
Sayamindu

-- 
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]


More information about the Localization mailing list