[Localization] [Etoys] Fwd: Pootle now fully open for translation

Tue Nov 27 11:51:33 EST 2007

Hi, everyone,

Halpern's article at http://www.cjk.org/cjk/c2c/c2cbasis.htm as pointed out by
Yuan Chao is well worth reading.

However I have some reservations about certain ideas which Halpern presents
which I think are worth pointing out.  (I also disagree with some of
his terminology, but
I won't get into that) For readers of this list who may not be so
familiar with Chinese, I think it is
important to make some of these things clear.

First, Halpern states the case for lexemic conversion ("Level 3
Conversion") too simply without qualifying the different circumstances
when lexemic conversion is appropriate and fails to discuss the
important question of when lexemic conversion is *not* appropriate.

Halpern states:

     " The mapping tables must map one lexeme to
       another on a semantic level, if appropriate. For
       example, SC 计算机 must map to its TC lexemic
       equivalent 電腦, not to its orthographic equivalent 計算機. "

In this presentation of his ideas, Halpern has now included the
conditional, "if appropriate."  I recall reading what was probably an
earlier draft of these ideas in which Halpern failed to qualify Level
3 Conversion at all.  The question which needs to be asked is, "When
is lexemic conversion appropriate?"  Let's look at a few cases.  Note
that a number of ideas discussed here in the context of Chinese may be
relevant to other languages as well.

CASE 1: MESSAGE CATALOG TRANSLATION WORK:

If you or I am translating a message catalog for a software program --
such as for OLPC localization -- then, yes, it is appropriate to
consider lexemic conversion.  In this case, this means we want to
insure that "计算机" is used for mainland Mandarin locales, while "電腦" is
used for Taiwan Mandarin locales.  Normally the message catalog "keys"
will be in English, so we would want to translate the English word
"computer" to "计算机" for mainland Mandarin Chinese locales, and to "電腦"
for a Taiwanese Mandarin locale.

If there existed a translation assistance software package with such a
degree of sophistication to automatically provide suggestions for such
lexemic conversions, that would be nice.  But it is hardly a
necessity.  We must hope that our translators and translation
reviewers will be fully versed in the dialectal differences between
mainland Mandarin and Taiwanese Mandarin.  In simple terms, we hope to
have people from Taiwan do the localization for Taiwanese locales,
people from the Mainland do localization for Mainland locales, people
from Singapore do Singapore locales, and so on.

A software tool may speed the process slightly.  But I would always
want to have an experienced human translator make the final decision.

Furthermore, if a software assistant program versed in the subtleties
of lexemic conversion did exist, I would want that software system to
also be able to remind me that "bonnet" is the British hood of an
automobile, while "hood" is the American hood of an automobile.  In
other words, if someone thinks lexemic conversion is important enough
to encode into mapping tables for Chinese, then I hope they plan to
extend their work to English, Portugese, Spanish, French, and so on.
This is not just a problem for Chinese.

CASE 2: A CHINESE WEB PAGE

Suppose that you or I write a Firefox extension which will, at the
press of a button, convert the currently-viewed Mandarin Chinese web
page from Simplified (SC) to Traditional (TC) orthography, or
vice-versa.  Wouldn't that be a nice extension to have? I certainly
think it would be nice.  Currently, only the largest international
companies or institutions in China/HK/TW/SG/etc. bother to provide
versions of their web pages in both SC and TC.  But the reality is
that most of the really interesting stuff on the World Wide Web is not
written by some big company.  So, if the writer is from Mainland
China, it will be written in 简体字 (SC).  If the writer is from Taiwan,
it will be written in 正體字 (TC).

Now, should this Firefox extension do lexemic conversion?  I think not.

If I am reading something written by an author from Taiwan, I want to
be able to read the original document just as the author intended,
using his own dialect preferences.  I don't want some computer program
mucking around and changing words in the text for me, regardless of
whether it is "電腦" or a proper noun like "甘迺迪" (gan1 nai3 di2, Kennedy
-- another example used by Halpern in discussing conversion of proper
nouns).  It is no different if I am reading an Irish author like James
Joyce or an American author like Upton Sinclair.  Knowledgeable
readers can deal with lexemic nuances in text -- It is not quite so
large a problem as a reading of Halpern may lead you to believe.

CASE 3: WIKIPEDIA.

Yuan Chao mentioned Wikipedia.  This is a really difficult case!

Lexemic conversion does seem appropriate for encyclopedic works which
are intended to present factual information to readers.  But what if
an encyclopedia article contains excerpts or quotations from another
work?  It seems that there should be a way of tagging a span of text
with a tag that says, "don't do lexemic conversion on this span of
text".  Does Wikipedia have such a tagging system yet?  I don't know.

(According to http://en.wikipedia.org/wiki/Zh.wikipedia.org, Wikipedia
does have a system of special wiki markup to prevent use of
Wikipedia's conversion tables for specific articles or specific words,
and so perhaps this is sufficient).

Authors of Chinese-language Wikipedia articles naturally span both
sides of the Straits of Taiwan.  This means that Wikipedia's "source"
documents can be a complete mixture of Simplified and Traditional
orthography.  As Mainland China continues to block access to the
Chinese-language version of Wikipedia, contributions from mainland
wikipedians are more restricted than they would be otherwise (access
through proxy servers is possible, but a mainland competitor, 百度百科
(baidu baike) from the Baidu search engine creators, now makes it
easier for many to just contribute to Baidu Baike instead of to
Wikipedia.

-- Ed Trager

>>
> > Taking wikipedia for example, originally there's only "traditional
> > Chinese" and "simplified Chinese" version with an auto-conversion
> > table. (though not perfect)
>
> There's an understatement. The CJK Dictionary Institute has a database
> for converting between Traditional Chinese and Simplified Chinese with
> millions of items in it. Character substitution is simply not enough.
> It is also not 1-1 in many cases. Details at
> http://www.cjk.org/cjk/c2c/c2cbasis.htm.
>

>
> > --
> > Best regards,
> > Yuan Chao
> > _______________________________________________
> > Localization mailing list
> > Localization at lists.laptop.org
> > http://lists.laptop.org/listinfo/localization
> >
>
>
>
> --
> Edward Cherlin
> Earth Treasury: End Poverty at a Profit
> http://wiki.laptop.org/go/Earth_Treasury
> "The best way to predict the future is to invent it."--Alan Kay
>
> _______________________________________________
> Localization mailing list
> Localization at lists.laptop.org
> http://lists.laptop.org/listinfo/localization
>