[Localization] Revamping the glossary
Sayamindu Dasgupta
sayamindu at gmail.com
Mon Mar 3 12:32:25 EST 2008
On Fri, Feb 29, 2008 at 2:29 PM, Alexander Dupuy <alex.dupuy at mac.com> wrote:
> Sayamindu Dasgupta wrote:
> > Thanks for the pointer to the translate.sourceforge.net page. I think
> > that the translate project folks have already managed to achieve what
> > I was trying to do.
> > I have followed the steps outlined in that page - and I'm attaching
> > the output. Does the attached file look more comprehensive ?
>
> It's certainly more comprehensive, but it goes a bit far the other way,
> including a lot of terms that only appear once, and stuff that not only
> doesn't need to be in the Pootle Terminology, but that probably
> shouldn't even be translated ("acos" to "tanh" function names, "1000" to
> "10" numbers). There's also a lot of stuff with different numbers
> ("Preset 1" "Preset 2") which could easily be collapsed.
>
I agree that it goes a bit overboard :-).
> Did you use the poglossary bash script in translate-toolkit-1.1.0? It
> seems to automate the steps in their documentation, so that you can run
> it as a single script rather than a whole sequence of commands.
>
Somehow the bash script was giving errors - but I ran the same set of
commands it would run.
> # poglossary - takes a directory of PO files and creates a glossary
> # out of them. The glossary will only contain short phrases and is output
> # in PO, TMX and CSV formats.
>
> if [ $# -ne 3 ]; then
> echo "Usage: `basename $0` language-iso-code input-dir project-name"
> echo " e.g. `basename $0` zu zu-po firefox"
> exit 1
> fi
>
> However, this would appear to suffer from most of the same problems as
> noted above, and a few others. Running it on the Spanish PO files
> downloaded from Pootle as packaging-es.zip pootle-es.zip update1-es.zip
> xo_bundled-es.zip gives 787 msgids in 3621 lines (your version has 605
> msgids in 2467 lines). So it's not really a panacea, but does provide a
> convenient starting point for simple modifications.
>
> The word counting approach I described earlier is more complex, but
> should provide better results (although it will require significantly
> more coding).
>
Yeah - I'm thinking of filtering out the stopwords and doing a
comparison if I get any sensible results.
Also - do you think putting the glossary (the 605 string one) in the
wiki and asking people to point out unnecessary msgids would be of any
use ?
Thanks,
Sayamindu
--
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]
More information about the Localization
mailing list