[Localization] Translating the manuals?

Clytie Siddall clytie at riverland.net.au
Fri Mar 26 03:58:31 EDT 2010


To: OLPC Localization list
Cc: Deb18n and Translate Toolkit lists, hoping to pick their brains

Background: the XO laptop software is available in Vietnamese. An OLPC project in Vietnam asked me for translated manuals.

Chris, thankyou for your detailed reply. My comments are interleaved below.

On 26/03/2010, at 2:57 AM, Chris Leonard wrote:

> On Wed, Mar 24, 2010 at 2:56 AM, Clytie Siddall <clytie at riverland.net.au> wrote:
> 
> 1. Has anyone translated any part of these [OLPC] manuals into Vietnamese?
> 
> 2. If not, can we get this text on Pootle (e.g. using po4a [1] which converts many doc formats into PO files)?
> 
> So, Clytie, this is a more expansive answer to your question about localizing the manuals into Vietnamese and the tools available to do so.
> 
> For long form texts (like manuals and web-sites) it isn't quite as simple as just scraping the translatable strings with a tool like po4a and posting the PO file on Pootle.  The eco-system of tools is simply not as complete or mature as it is for code internationalization (i18n) and localization (L10n) and perhaps even more importantly, the familiarity and experience with support of po4a doesn't exist within the OLPC community at present.

Taking on any new process is definitely a barrier, and doc-l10n is definitely a complex task.
> 
> For code, there is a fairly mature eco-system of tools that make a coordinated / distributed publication process simpler for localizers and developers alike.  Tools like gettext assist in the first phase of i18n (preparing PO files) and there are also methods for connecting Pootle up to software repositories (like git) that can help keep the repo and the PO files concurrent and synchronized.  After L10n is completed, there are further mechanisms for committing the completed PO file back to the repository where additional i18n steps occur in the build-creation phase of release publication (e.g. generation of MO files and language packs). 
> 
> Even so, it takes a fair amount of manual intervention behind the scenes to make all of this work.  By-and-large, Sayamindu carries most of that burden by himself, which is why I've chosen to work on various administrivia aspects of the Sugar Labs / OLPC localization effort, so that he can focus on the absolutely critical stuff that he is uniquely qualified to do.  We are fortunate that for the most part our developers are committed to doing their part with respect to i18n, particularly the core Sugar developers and a good handful of Activity developers, but there is still room for improvement in terms of a number of individually contributed Actvities and getting them set up for inclusion in Pootle. This is mostly an ongoing education challenge and not a technical issue.

Indeed. Even long-established i18n projects are still evolving at this level. The whole distributed nature of FLOSS, and its voluntary contributions, can make it difficult to establish standards and change practice. However, a few enthusiasts can achieve a great deal. Sayamindu is a valiant example: without him investing the time to learn about Pootle, set it up and support it, many OLPC localizations would not exist. I wouldn't even know about OLPC: I only found out about it via the Pootle list, IIRC. The leading FLOSS i18n projects owe much of their success and virtually all of their progress to a few dedicated volunteer coordinators. So we can do it, but it takes time and effort to build a workable and sustainable process.
> 
> A far as long-form HTML-based text there is something of a gap in the maturity of the tools, particularly with respect to the interfaces between the i18n and L10n (and then back to i18n) process, and while it is true that po4a is an attempt to address this, it is simply not at the same "plug-n-play" level as a toolset as what is currently available for code L10n.  No insult meant to the developers of po4a (I applaud and appreciate their work), but they themselves refer to these issues in the rather extensive manpage below.  Reading it gives you some flavor for the complexities involved.
> 
> http://po4a.alioth.debian.org/man/man7/po4a.7.php
> 
> It may seem a little unfair, but by the "eat your own cooking " standard, if you look at the repo for po4a, they only have their own documentation localized into a small handful of European languages, which must say something about the tool itself or at least the overall context in which it currently exists.

As a Debian translator, I can tell you I haven't translated that manpage because I haven't had the time. It's a matter of priorities for what resources you have available. I'd be interested to hear what the larger language teams have to say. Odds are a briefer manpage would get translated first.

>   Their L10n process still involves many manual steps that play themselves out on their e-mail lists, and this is not a truly scalable solution.

I agree with you entirely on HTML and websites in general. They are a pain to translate. It's very difficult to keep up with changes. I don't really know any project which handles website-translation effectively, although Debian probably comes closest: the pages are accessible via source control, there is good documentation on what to do, you have an RSS feed and diff for updates, but you're still working with wml files and no segmentation or effective metadata. I tend to avoid webpages for this reason. Nearly all web/wiki pages I've translated initially have languished without update due to the difficulty of following the pages up, compared to normal PO/XLIFF update. Of the few web/wiki pages I currently maintain:

1. Scratch website [1]
The files are on their Pootle. I have no idea what l10n process they go through, but I translated this website simply because it was promoting free software and available in PO format, so I could use my offline editor or Pootle. Time will show if it gets updated regularly. I have yet to see my translation on the website. The quality of the original strings was appalling, showing that review is an essential part of the L10n process.

2. TuxPaint website [2]
These files were available on the Locamotion Pootle, during the Decathlon project. I could use my existing workflow. We are currently setting up an ongoing process to update the files and continue using Pootle (I volunteered to support this project). Wordpress is also available on the Locamotion Pootle, so I translated that as well (hopefully someone will volunteer to support it).

3. Creative Commons, Frugalware, GNU/Linux Matters – all available on their individual Pootles.

Manuals and other docs are more accessible in some projects.

1. GNOME makes all manuals/docs available in PO format right next to the application files they support (e.g. [3]) in their translation interface, and can also be updated and committed via git. Even though we haven't had the resources to do many of the docs yet, we have done the licences, for example, and the fact that the docs are available in PO format, fully as easy to grab and as up-to-date as the application files, means they are more accessible to us than other docs. They will get translated simply because they are there and we can use an effective editing and update process.

2. KDE does the same as GNOME (initially inventing the stats/access interface, but now GNOME have invented their "Damned Lies" interface which I find superior). It works well for them. We didn't have the resources to start on the docs, but we will when we do.

3. Debian does get more results. AFAI can tell, all the manuals, manpages, release announcements, reference cards etc. (that's a truly enormous mass of documentation) are available in PO format via SVN. There are status webpages, reminders on the list, build logs, almost-live web display, plus I get RSS/email reminders when they need updating, and can just pull the updated file from source control, then commit my changes. While source control access can be a barrier to added participation, it works well for translators who are accustomed to it. I maintain more docs at Debian than anywhere else, simply because they are available, always updated, and the Deb18n project itself is (in my experience) the leading internationalization project in FLOSS. It reviews original strings, uses more recent gettext tools (e.g. msgid-previous, so you can compare the previous original string to the changed one) and invents its own tools, follows up on translation implementation, is extremely well-executed and innovative, and it is generally a pleasure and a satisfaction to volunteer for Deb18n. Quality is their highest priority. So, despite the initial load of setting up the po4a etc. process for their docs, the results have IMNSHO been worth it. However, economy of scale comes into play with projects as large as Debian. Create po-debconf or po4a, put in all the work to implement it, and the output grows. This may not apply as well to a small project, although in my experience the investment in tools and lowering access barriers for translators is generally a good return.

4. Pootle's docs are available on the Locamotion Pootle, and are fully-translated and updated in Vietnamese. The wiki pages are also going to be on Pootle, but meanwhile, I haven't been able to keep up with the updates to those pages. I find the contrast telling in this context, since the wiki information is often of higher (and more topical) priority. I would certainly work on those pages if I had a viable translate/update process, with string segmentation.

5. I also maintain occasional docs where the developer sends me update emails. These docs get lower priorities, however, because I have to work with a page of text and a manual diff, rather than a translation format. It's simply more time-consuming and more liable to create errors. Also, it's more difficult to maintain translation memory.

So, Pootle and PO/XLIFF conversions have worked for me with docs. Manual maintenance of pages of text has not. I recognize the amount of work necessary to set up the level of support we translators need.
> 
> Importantly, a significant aspect of the maturity gap has to do with user training as much as the tools themselves.  As a rule, FOSS software developers have accepted the importance of localization and are easily induced to do their part on i18n

Ha! I can feel many i18n project coordinators rolling over in their graves (the ones who died of frustration, that is). Easily? Try "after a great deal of blood, sweat and tears". ;)

> but authors (both web-page developers and part-time documentation writers) have not yet been as well-indoctrinated by the localization community and so there is often a disconnect between the format of the document as produced by the authors as well as the tools they use to publish it (e.g. their site-hosting or wiki for instance) and the processes involved in i18n-L10n.  Mostly this means that there is much more manual intervention needed because the automation of these tools for a highly distributed environment like Sugar Labs / OLPC has not reached the same level of sophistication on the back-end, in order to provide the simplest interface for developers and localizers on the front-end.

Actually, many application developers neglect their own documentation. Either they aren't comfortable with explanatory writing and user support, or they simply give it lower priority due to ignorance ("Everyone knows how to do X") or lack of time. As with translators, the app strings come first. So docs need better support in general.

I agree, however, that many people creating documents and not apps. may not be accustomed to localization as an essential step in the process. And we prefer "supported" to "indoctrinated". ;)

There is indeed a "disconnect", even a series of them. I haven't been particularly impressed even with the internationalization built in to a CMS like Drupal. We don't have a workable process to replace the app-based PO setup. Everyone flounders around with their own process, and so far nothing really works for the entry-level doc writer.
> 
> The "right" way to address the localization of long-form text is a challenge that has been kicked around in a number of OLPC forums, the library list (for HTML-based content), the wiki gang (for wiki L10n) as well as the Support Gang list (for the localization of manuals and other documentation).  To date, no clear consensus around tools or methods has appeared.  Solutions remain a mix of one-off and intensively manual processes.   There have been various attempts to work towards using PO-file based methods (like the an early attempt with the OLPC web-site itself), but they have not gained real momentum or sustainability.
> 
> The FLOSS manuals setup for localization has been explored, but I don't think it could be called truly successful (while different languages have been set up, the number of those completed and published is not great).  Whether that is due to the tools or the challenge of coordinating the human resources involved is not ewntirely clear to me.

I've registered there, joined Yet Another Mailing List™, and will give you some feedback once you provide the reviewed manuals on which you recommended I begin. From what I've seen of the FLOSS Manuals site [4] so far, I have concerns about access control (spam or inexperienced translating, and overwriting of existing translations) and the ability to use translation memory, backup your work, or get workable diffs with update. So far, I haven't received any answers to specific questions on those issues on the FLOSS Manuals list, although they have been quick to welcome and support participation. I think it is likely that the site is setup more for writing/publishing than localizing.

>   Obviously, there are real advantages to be gained by providing a single interface (Pootle) to localizers, but all of the necessary pieces have not yet been assembled to enable that for long-form text L10n work yet.
>  
> I wish there was a happy answer to your question.  I do think that some individual elements of the solution are available (like po4a), but it will take some considerable effort by a fair number of people to establish and then maintain a long-form text publication process that works as smoothly as the current code L10n set-up.  Given that neither Sugar Labs or OLPC has genuinely taken ownership of the textual content creation and curation process, I'm not sure that the committed resources to accomplish an overall solution will be forthcoming and so this remains a challenge that is not adequately addressed.
> 
> Those are just my thoughts, which are not intended as criticisms but as an acknowledgement of the status quo.  There are some content-creation projects related to health education on the XO laptop that I would love to take on, but I've put them on the back-burner because I don't think they would be sustainable (in a useful, localized form) with the level of resources I could commit by myself.
> 

It comes down to resources every time (would short-time investment, e.g. a grant or Google SoC project, make a difference?). Thankyou very much for your thoughts. This is really a pan-project issue, and one affecting all of us.

I'd be interested to see suggestions from translators, coordinators and the Pootle/Translate-Toolkit devs on what we really need to make doc translations effective and sustainable. Can we simplify or build on the po4a/Translate-Toolkit/Pootle process? Can we integrate other existing XLIFF tools? What works for you? What would work better?

from Clytie 

Vietnamese Free Software Translation Team

[1] http://scratch.mit.edu/

[2] http://www.tuxpaint.org/

[3] http://l10n.gnome.org/teams/vi

[4] http://en.flossmanuals.net/FLOSSManuals/TranslatingAManual
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/localization/attachments/20100326/240c9a93/attachment-0001.htm 


More information about the Localization mailing list