[Localization] Revamping the glossary

Sayamindu Dasgupta sayamindu at gmail.com
Fri Feb 29 01:56:43 EST 2008


Hi Alexander,
Thanks for the pointer to the translate.sourceforge.net page. I think
that the translate project folks have already managed to achieve what
I was trying to do.
I have followed the steps outlined in that page - and I'm attaching
the output. Does the attached file look more comprehensive ?
Thanks,
Sayamindu


On Wed, Feb 27, 2008 at 4:30 AM, Alexander Dupuy <alex.dupuy at mac.com> wrote:
> I'm finally getting back to this after a week (and some).
>
>  Re-reviewing the glossary.pot that Sayamindu attached, I think I
>  understand now why both I and Edward Cherlin thought that it was the
>  Record activity POT.  Without having seen the script used to generate
>  this file (you had said you would upload it after some refinement but we
>  haven't seen it) I would guess that it is extracting all messages that
>  appear in more than one POT file (or more than once in any file?),
>  possibly with some filtering for "short phrases" along the lines
>  suggested in
>  http://translate.sourceforge.net/wiki/toolkit/creating_a_terminology_list_from_your_existing_translations
>  and implemented in translate-toolkit/tools/poglossary.
>  Since all the messages at the top appear in the Record activity (and
>  only a few messages, towards the end, are not present in the Record
>  activity) it seemed to us that this was all there was (but we were
>  mistaken).
>
>  By using only complete phrases as they appear in other POT files, this
>  approach provides as much context as possible, but it doesn't fully
>  capture the common terminology present in the existing POT files, since
>  terms such as "Activity" or "Mesh Network" which appear in multiple
>  messages, in multiple POT files, are not present in your attached
>  glossary.pot, since there is no message which consists of just these words.
>
>  While there are cases where a single word isn't sufficient, e.g. in the
>  case of ticket #6439 that you mention
>  (http://dev.laptop.org/ticket/6439) having "zoom" in the glossary isn't
>  helpful for Spanish, since completely different terms are used for "zoom
>  in" ("acercarse") and "zoom out" ("alejarse"), I think that there is a
>  real need for "mining" of terms that are common substrings of multiple
>  messages.  This is clearly a lot more work, and needs things like stop
>  lists to exclude words like "a" "the" "this" etcetera, but I think it is
>  worth the effort (and I'm actually willing to do some of the work on
>  this - it seems like the translate-toolkit utilities provide a useful
>  base for this - pocount.py is already doing the word break analysis
>  etcetera).
>
>  @alex
>
>  --
>  mailto:alex.dupuy at mac.com
>
>  _______________________________________________
>  Localization mailing list
>  Localization at lists.laptop.org
>  http://lists.laptop.org/listinfo/localization
>



-- 
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: newglossary.pot
Type: application/vnd.ms-powerpoint
Size: 39792 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/localization/attachments/20080229/7945ca86/attachment-0001.pwz 


More information about the Localization mailing list