[Localization] Revamping the glossary
Sayamindu Dasgupta
sayamindu at gmail.com
Fri Feb 29 01:56:43 EST 2008
Hi Alexander,
Thanks for the pointer to the translate.sourceforge.net page. I think
that the translate project folks have already managed to achieve what
I was trying to do.
I have followed the steps outlined in that page - and I'm attaching
the output. Does the attached file look more comprehensive ?
Thanks,
Sayamindu
On Wed, Feb 27, 2008 at 4:30 AM, Alexander Dupuy <alex.dupuy at mac.com> wrote:
> I'm finally getting back to this after a week (and some).
>
> Re-reviewing the glossary.pot that Sayamindu attached, I think I
> understand now why both I and Edward Cherlin thought that it was the
> Record activity POT. Without having seen the script used to generate
> this file (you had said you would upload it after some refinement but we
> haven't seen it) I would guess that it is extracting all messages that
> appear in more than one POT file (or more than once in any file?),
> possibly with some filtering for "short phrases" along the lines
> suggested in
> http://translate.sourceforge.net/wiki/toolkit/creating_a_terminology_list_from_your_existing_translations
> and implemented in translate-toolkit/tools/poglossary.
> Since all the messages at the top appear in the Record activity (and
> only a few messages, towards the end, are not present in the Record
> activity) it seemed to us that this was all there was (but we were
> mistaken).
>
> By using only complete phrases as they appear in other POT files, this
> approach provides as much context as possible, but it doesn't fully
> capture the common terminology present in the existing POT files, since
> terms such as "Activity" or "Mesh Network" which appear in multiple
> messages, in multiple POT files, are not present in your attached
> glossary.pot, since there is no message which consists of just these words.
>
> While there are cases where a single word isn't sufficient, e.g. in the
> case of ticket #6439 that you mention
> (http://dev.laptop.org/ticket/6439) having "zoom" in the glossary isn't
> helpful for Spanish, since completely different terms are used for "zoom
> in" ("acercarse") and "zoom out" ("alejarse"), I think that there is a
> real need for "mining" of terms that are common substrings of multiple
> messages. This is clearly a lot more work, and needs things like stop
> lists to exclude words like "a" "the" "this" etcetera, but I think it is
> worth the effort (and I'm actually willing to do some of the work on
> this - it seems like the translate-toolkit utilities provide a useful
> base for this - pocount.py is already doing the word break analysis
> etcetera).
>
> @alex
>
> --
> mailto:alex.dupuy at mac.com
>
> _______________________________________________
> Localization mailing list
> Localization at lists.laptop.org
> http://lists.laptop.org/listinfo/localization
>
--
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: newglossary.pot
Type: application/vnd.ms-powerpoint
Size: 39792 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/localization/attachments/20080229/7945ca86/attachment-0001.pwz
More information about the Localization
mailing list