[Localization] UDHR transcriptions needed

Alexander Dupuy alex.dupuy at mac.com
Thu Jul 7 17:17:25 EDT 2011


Chris Leonard wrote:

> Gonzalo and I have been collaborating on packaging the Universal
> Declaration of Human Rights as a content bundle for the XO.
>
> One of the things we have found is that size is an issue and as the
> HTML versions are typically smaller than the PDF versions provided on
> the UDHR web-site, we would prefer to use an HTML version wherever
> possible.
>
> On the UDHR website, there are a number of PDFs that are provided as
> image scans and no HTML version is present to be scraped from their
> web-site.
>   

Hi Chris,

You may not be aware of the following website at Unicode.org (I wasn't
aware of it myself until I picked one of the following languages at
random and started doing some internet searching):
http://unicode.org/udhr/index.html

While not all of the languages on your list are past the initial (sh =
status & history) stage, several of them have XML translations in
Unicode, which could be easily converted into HTML documents without any
special resources (other than appropriate Unicode fonts, I suppose, to
avoid inadvertent errors in the editing process).

There are also some (about a dozen) languages on the Unicode site that
are not listed on the OHCHR site (e.g. Cherokee), including a handful of
alternate scripts, e.g. Traditional Chinese and Hanh Mongolian script.

Note that the languages codes used don't correspond - I would advise
standardizing on ISO language_country at script codes for any filenames
(Unicode seems better that way than the OHCHR, but they still use
alternate codes, e.g. for Dari, which OLPC/Sugar have as fa_AF).

The translations are not the same for some of these languages; however,
they do appear to be in the correct languages and correspond at least
roughly.  I imagine that a lot of the variance is due to trying (at
different levels of effort) to explain the somewhat abstract ideas in
the UDHR into easily understood vernacular speech.

> Achuar Chicham
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=jiv
>   
http://unicode.org/udhr/d/udhr_jiv.html (same language, different
translation)

> Awapit
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kwi
>   
http://unicode.org/udhr/d/udhr_kwi.html (same language, different
translation)

> Bai Coca
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1121
>   
http://unicode.org/udhr/d/udhr_snn.html (text appears 100% identical)

> Pai Koka
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1123
>   
http://unicode.org/udhr/d/udhr_sey.html (text appears 100% identical)

This language appears to be extremely similar to Bai Coca above, except
for orthography - notably c/qu -> k, which is typical for SIL ->
governmental orthographic changes in Spanish-speaking countries, e.g.
ALMG official orthography in Guatemala vs. Summer Institute for
Linguistics orthography for Bible translations. If space is an issue
here, it might make sense to just pick up one of these, and even if you
include both, note them as one language with two orthographic variants,
like Serbian Latin v. Cyrillic

> Bengali
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=bng
>   

http://unicode.org/udhr/d/udhr_ben.html (appears to be identical based
on a few spot checks - I can't read Bengali at all, so this is just
"squiggle-compatible" check)

> Chaa'pala
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1122
>   
http://unicode.org/udhr/d/udhr_cbi.html (text appears 100% identical)

> Crioulo da Guin?-Bissau (Guinea Bissau Creole)
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gbc
>   
http://unicode.org/udhr/d/udhr_pov.html (text appears 100% identical)

> Croatian
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=src2
>   
http://unicode.org/udhr/d/udhr_hrv.html (text appears 100% identical)

> Dari
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs1
>   
http://unicode.org/udhr/d/udhr_pes_2.html (appears pretty similar, but
although my Arabic script is better than my Bengali, the terrible
quality of the PDF scan makes it hard to be sure)

> Dine, Navajo (Navaho)
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nav
>   
http://unicode.org/udhr/d/udhr_nav.html (text appears 100% identical)

> Dzongkha/Bhutanese
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=dzo
>   
http://unicode.org/udhr/d/udhr_dzo.html (this contains only article 1)

> Even
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eve
>   
http://unicode.org/udhr/d/udhr_eve.html (text appears 100% identical -
both are missing preamble, HTML version explicitly so)

> Farsi/Persian
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs
>   
http://unicode.org/udhr/d/udhr_pes_1.html (appears very similar, but
again, the poor quality of the PDF scan makes it hard to be sure)


> Gujarati
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gjr
>   
http://unicode.org/udhr/d/udhr_guj.html (seems pretty similar,
squiggle-compatible)

> Hebrew
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hbr
>   
http://unicode.org/udhr/d/udhr_heb.html (text appears 100% identical)

> Inuktitut
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=esb
>   
http://unicode.org/udhr/d/udhr_ike.html (text appears 100% identical)

> Kannada
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kjv
>   
http://unicode.org/udhr/d/udhr_kan.html (text appears 100% identical)

--- At this point I ran out of time to do more than download the HTML
version - grabbing the PDFs from the OHCHR site was just too slow) ---

> Kazakh
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kaz
>   
http://unicode.org/udhr/d/udhr_kaz.html

> Kichwa
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=qug
>   
http://unicode.org/udhr/d/udhr_qug.html


> Malayalam
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mjs
>   
http://unicode.org/udhr/d/udhr_mal.html

> Marathi
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mrt
>   
http://unicode.org/udhr/d/udhr_mar.html

> Nepali
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nep
>   
http://unicode.org/udhr/d/udhr_nep.html

> Ojibway (Ojibwe)
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ojb
>   
http://unicode.org/udhr/d/udhr_ojb.html

> Otuho
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=lot
>   
http://unicode.org/udhr/d/udhr_lot.html

> Pashto/Pakhto
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=pbu
>   
http://unicode.org/udhr/d/udhr_pbu.html (this contains only article 1)

> Sapara Atupama
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1124
>   
http://unicode.org/udhr/d/udhr_zro.html

> Saraiki
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=skr
>   
http://unicode.org/udhr/d/udhr_skr.html

> Shilluk
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=shk
>   
http://unicode.org/udhr/d/udhr_shk.html

> Shuar Chicham
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1125
>   
http://unicode.org/udhr/d/udhr_jiv.html (this is the same URL as for
Achuar Chicham ?LangID=1125 at the start of this list - seems like a
duplicate entry anyhow)

> Swampy Cree
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=crm
>   
http://unicode.org/udhr/d/udhr_csw.html

> Tamang (Tam)
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=taj
>   
http://unicode.org/udhr/d/udhr_taj.html (this contains only article 1)

> Tamil
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tcv
>   
http://unicode.org/udhr/d/udhr_tam.html

> Tatar
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ttr
>   
http://unicode.org/udhr/d/udhr_tat.html

> Thai
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=thj
>   
http://unicode.org/udhr/d/udhr_tha.html

> Ticuna
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tca
>   
http://unicode.org/udhr/d/udhr_tca.html

> Tsafiki
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=cof
>   
http://unicode.org/udhr/d/udhr_cof.html (different language name - needs
PDF verification)

> Uighur
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=uig
>   
http://unicode.org/udhr/d/udhr_uig_arab.html
http://unicode.org/udhr/d/udhr_uig_latn.html
(HTML has two different script variants - Arabic and Latin)

> Urdu
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=urd
>   
http://unicode.org/udhr/d/udhr_urd.html

> Wao Tededo
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1127
>   
http://unicode.org/udhr/d/udhr_auc.html

> Yi
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=iii
>   
http://unicode.org/udhr/d/udhr_iii.html ((this contains only article 1)
> Yukagir
> http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=yk


-- 
mailto:alex.dupuy at mac.com



More information about the Localization mailing list