documentation-translation workflow

Mon Oct 15 18:17:44 EDT 2007

Wonderful!

Helping to put together volunteers; 3 technical writers have made themselves
available for assisting with documentation; anyone who is interested in
helping work out the "collaborative tools/localization tools" -- please
touch base. They are interested in serving the OLPC project and am
suggesting that strategy be developed for channeling more volunteers -- then
I expect there will be more -- discussing assigning a writer to each
activity if we are so lucky.

Comments:

*reading level*
- 100% agree on simplified technical english. great if developers/others can
use simplified english in original note-taking. if you have ms word you can
run a test that calculates age level - 6th grade is probably good. if you
don't have ms word you can send an open office doc or wiki url where you are
taking notes.

*screenshots*
- visuals are important. beyond providing reference material on "how" --
helpful if developers can think of a step by step scenario that someone can
walk through to "try" the activity. when writing books I found it helpful to
walk through a process/task, and take screenshots, drop them into a
document, then go back and write captions -- then as necessary filling in
with background info
- good to think of whether you can provide a concept, task or reference. all
three categories are helpful
- suggest taking screenshots in as small a format as possible; if you take
the entire screen and can crop to the relevant section, great -- otherwise
in caption you could include notes to crop to a certain area
- for localization of screenshots, suggest not worrying about in-language
text for now -- but that localization filenaming convention be established
for naming screenshots. immediate term: blah-en or blah-sp. not clear on if
OLPC has decided on ISO language codes.

*tools*
- have been working on looking at various tools for
documentation/localization; if interested/working on such things, please get
in touch. I will subscribe to localization at lists.laptop.org (btw Idiom has
donated an instance of idiom worldserver that can be studied as a method of
incubating open source alternative, trying to find configuration help.)

*portal*
- started google group recently as repository for documentation strategy,
files and information, and as temporary method of corralling/versioning
content, and for having an email list. could be precursor to
doc at lists.laptop.org -- if interested, touch base.

*localization*
Lingo Systems stepped up to the plate with build 542 notes and helping to
get translated into 7 different languages; not sure about this time around.
probably most scalable approach will be to do things in stages, using a
"human content management system" until more automation can occur.
Suggesting 3 stages - alpha - beta (draft) - publis

> alpha: have a dynamic wiki-based alpha TOC, from which a "TOC code freeze"
can be made by tech writers working back from translation deadline, pulling
TOC into editing, then pulling content in from the live wiki pages where
notes are and editing
> beta:  dropping TOC and pages into beta wiki pages and pursuing 3 channel
localization strategy (until something better can be found) -- channel 1:
translation can occur directly within wiki page (not ideal, no translation
memory, can be time consuming and prone to human error) or copy and pasted
into word along with note that it is being translated - channel 2: as
content is dropped into beta wiki page, it is also placed on wordpress -
advantage here is simply that we can connect to world wide lexicon which has
a distributed translation roundtrip solution -- channel 3: putting content
into word processing documents and circulating to professional translators
if/as available, who often prefer RTF format.
> publishing: based on what translations are available, printing out
chapters/modules from TOC code freeze out to PDF and HTML.

Practical world -- for upcoming release -- my understanding is we have a
little under two weeks, and that the "translation freeze" should be by
midnight this friday EST -- this would be for any documentation that is
being handled already -- we've already been working on aggegrating notes,
links to existing wiki pages, and massaging previous notes. so the goal at
the moment for this release is simply to get something better and more
cohesive than build 542 notes -- problem is translation. not sure if lingo
will come through, at this point need volunteers; ideally professional
bilingual technical writers but any translation will help. it's wasteful not
to have all of this in a cms so we could just send the "new" material for
translation but it would be too time consuming to go through and calculate
localization based on x-diffs from wiki pages (unless someone knows of an
existing, working, wiki-based translation management plug-in), so we
restarted documentation from scratch.

so the following languages are needed -- if anyone knows anyone who speaks
english and any of the languages below other than english:

1st priority: es, pt, en, hi, am
es = spanish | pt = portuguese | en = english | hi = hindi | am = amharic
(ethiopia)

2nd: ar, th, he, fr , ru
ar = arabic | th = thai  | he = Hebrew | Fr = French  ru = russian

On 10/15/07, Micheal Cooper <cooper.me at gmail.com> wrote:
>
> I sent these ideas to Jim Gettys, who suggested that I send them to
> the development and localization mailing lists.
>
> ------
> Summary:
>
>     * Write/ Edit primary documentation according to an explicit set
> of writing conventions designed to minimize ambiguity and complexity
> in order to facilitate translation.
>     * Treat this English documentation as source code which is meant
> to be translated/compiled into user languages.
>     * Use/Create collaboration tools to make translation,
> distribution, and maintenance of docs more efficient.
>
>
> ------
> Assumptions:
>
> Some of those doing translation will not be professional translators
> fully bilingual in English and the target language. They might be any
> of the following:
>
>     * a village teacher who speaks the target language as her first
> language (L1) and English as a weak second language (L2);
>     * a missionary who speaks English as L1 or L2 (in the case of a
> French missionary in Africa, for example) and the target language as a
> weak L3;
>     * a professional translator who speaks a non-English L1, reads and
> writes the target language as L2, and knows English as just a subject
> that he or she studied in school and uses for travel;
>     * a native L1 speaker of the target language who has immigrated to
> a foreign country in which English is spoken as a primary or secondary
> language.
>
> Many of the translators are not going to be career translators, so
> rather than having the translator accommodate the source text, the
> source text should accommodate the translator.
>
> Documentation translation is particularly difficult because of how
> documentation is usually created. Often docs are written grudgingly at
> the end of the project, and docs are rarely written to a uniform
> format or set of conventions. There is little reflection on what kind
> of docs are needed, and docs are usually not edited before they are
> sent off for transl and publishing. The conventional approach to
> translation is that, when a novel or academic article is translated,
> it is the burden of the translator to accommodate the original, and if
> the original is unclear, this lack of clarity is translated into the
> target texts because the target text must be a mirror of the original.
> I know this from direct experience, having been the translator for
> many doc jobs from Japanese companies. The originals are often
> incomprehensible because of ambiguity and inconsistency, as in the
> following examples:
>
>     * different sections of the docs are written by different people
> using different terminology for the same processes and entities;
>     * unconfident writers are too brief, assuming background info and
> context to which the translator does not have access;
>     * more confident writers use too many idioms and colorful
> expressions, rambling on and on in extended and poorly-organized
> complex sentences;
>     * section divisions and overall organization are inconsistent,
> forcing the translator to restructure the original before beginning
> the translation;
>     * ambiguities inherent in the language itself (like the absence of
> gendered pronouns and explicit sentence subjects in Japanese) also
> complicate the translation, forcing the translator to contact the
> writer of the original, thus slowing the process and degrading
> translator motivation and confidence.
>
>
> Ambiguity is the biggest obstacle to translation. If it is a rush job
> (and it always is), and especially if the translation is being handled
> by a middleman like a publisher or web design firm (and these days it
> almost always is), the translator usually retreats to literal
> translation in the face of ambiguity because there is no way to
> contact the author (middlemen don't want the translator to know how
> much the client is being billed for translation) or no time to wait
> for the reply. When the text is unclear, the translator has no choice
> but to translate the ambiguity itself. In the case of OLPC
> documentation, ambiguity should be avoided at all costs. Anything that
> interferes with teachers and students using the notebooks should be
> avoided, and bad docs would certainly be frustrating and demotivating
> for the educators and pupils. In order to have translations that are
> as clear as possible, we must have source-docs that are as clear as
> possible.
>
> ------
> Reconception of documentation/ translation as parallel to computer
> programming:
>
> The OLPC team uses English as a common working language, but the users
> will be using translations, so the English documentation can be seen
> as not a product in and of itself but as the source for all
> translations. The English-language "source docs" should be written to
> a set of conventions meant to reduce ambiguity and ensure consistency,
> even when doing so necessitates violating conventional English writing
> style. The set of documentation standards I am proposing is similar to
> the set of coding conventions a programmer follows. The "source docs"
> (though written in English) should be seen as source code which is
> then compiled (or translated) into the many languages needed to
> support the users. Likewise, the source-docs should include explicit
> comments and extra-textual blocks to clarify ambiguity introduced by
> the writing style or inherent in the language itself, much in the same
> way that a good programmer includes comments in source code to
> compensate for the lack of explanatory devices in the code itself.
> Looping through a multi-array doesn't tell you WHY you need to do so
> or how it plays into the next code block, just as being told that the
> subject of a sentence is "Suzuki-san" does not tell you if Suzuki is a
> "she" or a "he". Most techs have had the experience of having to
> maintain a code base which did not include sufficient comments: while
> "read the friendly code" or "use the source" might be good ways to
> learn to program, this kind of detective work is not an efficient use
> of time and effort.
>
> ------
> Doc writing conventions:
>
> Some linguistic research has been done on "simplified English" as a
> subset of English to use for low-level learners, and I think that it
> might be a good place to look for ways to simplify the source_docs.
> But just thinking intuitively, I have cooked up the following
> suggestions in order to generate discussion:
>
>     * Pronouns.
>           o Use the first-person singular pronoun "I" to represent the
> author of the docs,
>           o the second-person singular pronoun "you" to represent the
> reader of the docs, and
>           o the first-person plural pronoun "we" to represent the OLPC
> project.
>           o Examples. "We have designed a screen that switches to
> black-and-white to conserve energy. I will explain how to switch your
> screen to black-and-white. First, you press the X button on your
> keyboard...." Because we want the docs to be easily translated and
> easily understood, the tone should be personal, using "I" for the
> voice of the writer. This will be easier for amateur translators to
> translate and easier for younger readers to understand. This will also
> help the writer avoid the passive construction, which is very
> difficult for some non-native English speakers to understand.
>     * Lists.
>           o Use tables to explain parallel relationships, comparisons,
> the composition of an entity, and categorical relationships.
>           o Use numbered lists to explain the stages of a process, the
> steps in a sequence, or anything that has an inherent spatial or
> temporal order or expresses precedence. Do not use numbered lists if
> the numbers do not relate to some inherent property of the items. A
> grocery list should not be numbered, unless the order in which the
> items are purchased is important.
>           o Use bulleted lists for lists that do not have inherent
> order or precedence. The grocery list would be bulleted.
>     * All comma sequences should have a comma before the last
> conjunction, i.e. "I like to read books, eat shrimp, and run
> marathons," rather than, "I like to read books, eat shrimp and run
> marathons." It is fashionable right now to leave out the last comma,
> but doing so puts the onus of comprehension on the reader. While this
> is a nit-picky detail, OLPC source-docs should do as much of the work
> as possible so that translation and comprehension are as easy as
> possible.
>     * Use parentheses to include supplemental information like the
> gender of human agents, steps in a sequence, the target of a pronoun,
> etc. when there is any ambiguity.
>     * Many languages, including Japanese, represent non-native names
> in a native writing system. In Japanese, foreign names are written in
> a phonetic script called katakana, and my name is pronounced Kuupaa
> Maikeru. The result is that there is a loss of data; the orthography
> of my name (the spelling in English) is lost to any
> Japanese-to-English translator, as is the proper pronunciation. I
> suggest that all source-docs have personal names written in the
> alphabet and followed by the pronunciation written in IPA
> (International Phonetic Alphabet) in parentheses behind it. Then
> translators should be told to always put the original orthography in
> parentheses after the name that they are using, so that my name would
> be "<katakana>Kuupaa Maikeru</katakana> (<alpha>Micheal
> Cooper</alpha>)" in a Japanese translation.
>     * Insert a table that acts as a glossary of terms and their
> definitions at the beginning of each text. These would be the key
> nouns and verbs used in the text, terms that need to have clear
> meanings and consistent translations. The translators would be
> required to keep culminative lists in OO Calc or such of these key
> terms so that, in the case that the translator changes or a group of
> translators is doing the job, the key terms can be kept consistent. If
> we know ahead of time that there will be translator teams, this could
> be covered by a webapp or by Google spreadsheets.
>     * Idioms and culture-specific metaphors and references should be
> avoided or used sparingly. Of course, terminology that originated in
> cultural metaphor, like "kill a process" and "reboot the server" would
> be treated as key terms and added to the glossary to be translated
> consistently, but more creative and expressive language ("you can type
> like a banshee", "students will be on it like white on rice",
> "resulting in a Mickey Mouse, vanilla solution to the problem") should
> be curtailed.
>     * Use words, mathematical symbols, and visuals to reinforce and
> enhance purely verbal explanations with conceptual representations of
> information (I am thinking Edward Tufte here), i.e. (poor example, but
> here goes) "I will show you how to teach your students to create
> multimedia presentations. <in box> Sound + Pictures = Multimedia </in
> box>." I think you get the idea, though.
>     * The source-docs be organized so that each section and each
> paragraph is identified by a number and that the translators be
> required to maintain this organization so that paragraph 61 in the
> Yoruba translation is paragraph 61 in the source-docs. By doing so, it
> will be easier to modify the translations when changes are made to the
> source-docs. This would imply some kind of web-based app to store and
> manage the docs. I am looking at the way we translate in my
> organization and thinking about what would be a good online tool to
> coordinate translations. There are many proprietary tools with vast
> hoards of features and complications which cost 1-2 thousand dollars
> per user, but they are not suitable for OLPC. I think OLPC docs-trans
> would do well with a lighter, simpler application. If the list doesn't
> mind, I would like to post the resulting thoughts at a later date so
> that there can be an exchange of ideas.
>
>
> I apologize for the length, and I hope these ideas can be of help.
>
> Micheal Cooper, Japan
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
>

-- 
Todd Kelsey
630.808.6444
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20071015/2bd865a1/attachment.html>