[Localization] Trying to get big picture

Edward Cherlin echerlin at gmail.com
Sun May 4 04:20:17 EDT 2008


On Thu, May 1, 2008 at 9:47 PM, Chris Leonard <cjlhomeaddress at gmail.com> wrote:
>
> Hello,
>
> I'm trying to mentally work through mechanics of internationalization of
> textual content (or HTML content) and I have many questions.  I've read much
> of the wiki on the pootle, i18n,  l10n, translating, etec. topics, but they
> are all oriented towards code, not textual content, so much is still not
> clear to me.
>
> Let's assume that I have some great text in English, starting from
> plain-vanilla ASCII, but there may be some words that are going to be better
> if they are represented in italics or bold (for emphasis). Let's say I take
> care of that by using HTML mark-up, so I now have an English HTML text tha I
> want to internationalize for Pootle submission.
>
> Most of the HTML is doing stuff behind the scenes (links, font-size, etc.)
> which poses no special i18n issues, but some HTML mark up has the effect of
> modifying the presentation of text in a way that actually does have an
> impact on the text's meaning.
>
> (see pseudo-HTML below).
>
> sentence1 is phrase1a + phrase1b + phrase 1c
>
> sentence2 is phrase2a + <bold>phrase2b</bold> + phrase2c
>
> sentence3 is <italics>phrase3a</italics> + phrase3b + phrase3c
>
>
>
> sentence1 is no real challenge, it goes straight into a .pot file as a
> single string.
>
> But what about sentences 2 or 3?
>
> Do italics and/or bold tags translate into non-latin alphabets?  (especially
> say Nepali)

Not always.

It is possible to bold and slant type algorithmically in writing
systems for which Bold and Italic fonts are not available, but it is
not always culturally correct. I'm trying to visualize Italic Greek,
and failing, but it turns out that such fonts exist. Emphasis in
Japanese is often represented by changing hiragana to katakana. Italic
Arabic seems even more unlikely, and indeed a quick search fails to
turn up any, except for a pseudo-italic slanted LED font. (Bold is not
a problem to find.) I don't find any Indic italic either

> How do you parse this for presentation in Pootle?
>
> Do you put full sentence with HTML tag in it, hoping translator mentally
> interprets and adjusts HTML tags as needed to preserve emphasis desired
> within context of sentence?

You are not allowing for the fact that word order and emphasis is
different in different languages. You must allow the localizer to
represent the meaning of your emphasis in a manner appropriate to the
language, culture, and writing system, which means at a minimum
changing your markup. Standards for such things would have to be
worked out to prevent a ransom note effect in the UI.

> or do you break it into sub-phrases (broken at the HTML tag junctions in
> English)?

Sometimes, but not with full generality.

> I can envision the .po files (pseudo versions below are ASCII only),
>
> sentence1
> Where is the library?
> Donde esta la biblioteca?
¿Donde...?, por favor (You can type ¿ as compose-?-? on international
keyboards in Linux.)
Toshokan-ga doko-ni aru ka
Library (subject) where at is ?

Where is the *library*?
Toshokan-wa doko-ni aru ka
The particle wa (written ha) gives topic emphasis.

Where *is* the library?
Toshokan-ga doko ni arun deshyoo ka nee
When you can grok compound verbs with three terminating particles in
full generality, you will begin to understand the Japanese mentality.
I know enough to appreciate somewhat how much I don't know.

> sentence2
> Let's go to the beach!
> Vamos a la playa!
¡Vamos...!, por favor.
Bichi-e ikimasyoo

> does the internationalized backbone html look like this?
>
> <ahref="sentence1">www.laptop.org</a>
> <br>
> <ahref="sentence2">www.google.com</a>

That's backwards. <ahref="sentence1">www.laptop.org</a>
should be <ahref="http://www.laptop.org">sentence1</a>

> and if so what takes it and the .po file to produce a localized version?

This has no general solution. At Wikipedia, for example, _some_ pages
have rough equivalents in _some_ languages. But you have to look up
each one, and then you might still have to decide whether to translate
the page title.

<ahref="http://en.wikipedia.org/wiki/One_Laptop_per_Child">One Laptop
Per Child</a>
<ahref="http://fr.wikipedia.org/wiki/One_Laptop_per_Child">un portable
par enfant</a>

> Is this done with gettext ignoring the fact that it is not code, just text?
>
> Do gettext tools understand HTML?
> I know that is a lot of questions, but I am really hoping to get some health
> content prepared/bundled and it would help me very much to understand the
> later phases of 118n and l10n so that the entire process can be done in the
> most efficient fashion.  I more-or-less grasp the process used for code, but
> I'm not sure I understand how or if the same tools work for plain-text or
> HTML.

What do you want to emphasize in your texts? Real examples will make
your question much clearer to he rest of us.

> Any guidance (more wiki links, whatever) would be appreciated.

"Good judgment comes from experience. Experience comes from bad
judgment." Or sometimes from directed discovery.

> cjl
>
>
> http://wiki.laptop.org/go/User:Cjl
>
> cjlhomeaddress at gmail.com
>
> _______________________________________________
>  Localization mailing list
>  Localization at lists.laptop.org
>  http://lists.laptop.org/listinfo/localization

-- 
Edward Cherlin
End Poverty at a Profit by teaching children business
http://www.EarthTreasury.org/
"The best way to predict the future is to invent it."--Alan Kay


More information about the Localization mailing list