[Localization] UDHR transcriptions needed

Chris Leonard cjlhomeaddress at gmail.com
Thu Jul 7 17:39:49 EDT 2011


Thanks,

Oddly enough, Gonzalo came across the Unicode site a little earlier in the
day and sent me the link, but I had not had time to explore it.  Thanks for
the reviewing youve done.

I agree on not trusting OHCHR codes.  I think they mixed up aja and ajd (for
instance).  The Bai Coca - Pai Koka pair is odd, says it was submitted by
Ecuador, but I think it is a minority Chinese language.  Lots of QA to do.

cjl

On Thu, Jul 7, 2011 at 5:17 PM, Alexander Dupuy <alex.dupuy at mac.com> wrote:

> Chris Leonard wrote:
>
> > Gonzalo and I have been collaborating on packaging the Universal
> > Declaration of Human Rights as a content bundle for the XO.
> >
> > One of the things we have found is that size is an issue and as the
> > HTML versions are typically smaller than the PDF versions provided on
> > the UDHR web-site, we would prefer to use an HTML version wherever
> > possible.
> >
> > On the UDHR website, there are a number of PDFs that are provided as
> > image scans and no HTML version is present to be scraped from their
> > web-site.
> >
>
> Hi Chris,
>
> You may not be aware of the following website at Unicode.org (I wasn't
> aware of it myself until I picked one of the following languages at
> random and started doing some internet searching):
> http://unicode.org/udhr/index.html
>
> While not all of the languages on your list are past the initial (sh =
> status & history) stage, several of them have XML translations in
> Unicode, which could be easily converted into HTML documents without any
> special resources (other than appropriate Unicode fonts, I suppose, to
> avoid inadvertent errors in the editing process).
>
> There are also some (about a dozen) languages on the Unicode site that
> are not listed on the OHCHR site (e.g. Cherokee), including a handful of
> alternate scripts, e.g. Traditional Chinese and Hanh Mongolian script.
>
> Note that the languages codes used don't correspond - I would advise
> standardizing on ISO language_country at script codes for any filenames
> (Unicode seems better that way than the OHCHR, but they still use
> alternate codes, e.g. for Dari, which OLPC/Sugar have as fa_AF).
>
> The translations are not the same for some of these languages; however,
> they do appear to be in the correct languages and correspond at least
> roughly.  I imagine that a lot of the variance is due to trying (at
> different levels of effort) to explain the somewhat abstract ideas in
> the UDHR into easily understood vernacular speech.
>
> > Achuar Chicham
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=jiv
> >
> http://unicode.org/udhr/d/udhr_jiv.html (same language, different
> translation)
>
> > Awapit
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kwi
> >
> http://unicode.org/udhr/d/udhr_kwi.html (same language, different
> translation)
>
> > Bai Coca
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1121
> >
> http://unicode.org/udhr/d/udhr_snn.html (text appears 100% identical)
>
> > Pai Koka
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1123
> >
> http://unicode.org/udhr/d/udhr_sey.html (text appears 100% identical)
>
> This language appears to be extremely similar to Bai Coca above, except
> for orthography - notably c/qu -> k, which is typical for SIL ->
> governmental orthographic changes in Spanish-speaking countries, e.g.
> ALMG official orthography in Guatemala vs. Summer Institute for
> Linguistics orthography for Bible translations. If space is an issue
> here, it might make sense to just pick up one of these, and even if you
> include both, note them as one language with two orthographic variants,
> like Serbian Latin v. Cyrillic
>
> > Bengali
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=bng
> >
>
> http://unicode.org/udhr/d/udhr_ben.html (appears to be identical based
> on a few spot checks - I can't read Bengali at all, so this is just
> "squiggle-compatible" check)
>
> > Chaa'pala
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1122
> >
> http://unicode.org/udhr/d/udhr_cbi.html (text appears 100% identical)
>
> > Crioulo da Guin?-Bissau (Guinea Bissau Creole)
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gbc
> >
> http://unicode.org/udhr/d/udhr_pov.html (text appears 100% identical)
>
> > Croatian
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=src2
> >
> http://unicode.org/udhr/d/udhr_hrv.html (text appears 100% identical)
>
> > Dari
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs1
> >
> http://unicode.org/udhr/d/udhr_pes_2.html (appears pretty similar, but
> although my Arabic script is better than my Bengali, the terrible
> quality of the PDF scan makes it hard to be sure)
>
> > Dine, Navajo (Navaho)
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nav
> >
> http://unicode.org/udhr/d/udhr_nav.html (text appears 100% identical)
>
> > Dzongkha/Bhutanese
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=dzo
> >
> http://unicode.org/udhr/d/udhr_dzo.html (this contains only article 1)
>
> > Even
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eve
> >
> http://unicode.org/udhr/d/udhr_eve.html (text appears 100% identical -
> both are missing preamble, HTML version explicitly so)
>
> > Farsi/Persian
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs
> >
> http://unicode.org/udhr/d/udhr_pes_1.html (appears very similar, but
> again, the poor quality of the PDF scan makes it hard to be sure)
>
>
> > Gujarati
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gjr
> >
> http://unicode.org/udhr/d/udhr_guj.html (seems pretty similar,
> squiggle-compatible)
>
> > Hebrew
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hbr
> >
> http://unicode.org/udhr/d/udhr_heb.html (text appears 100% identical)
>
> > Inuktitut
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=esb
> >
> http://unicode.org/udhr/d/udhr_ike.html (text appears 100% identical)
>
> > Kannada
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kjv
> >
> http://unicode.org/udhr/d/udhr_kan.html (text appears 100% identical)
>
> --- At this point I ran out of time to do more than download the HTML
> version - grabbing the PDFs from the OHCHR site was just too slow) ---
>
> > Kazakh
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kaz
> >
> http://unicode.org/udhr/d/udhr_kaz.html
>
> > Kichwa
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=qug
> >
> http://unicode.org/udhr/d/udhr_qug.html
>
>
> > Malayalam
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mjs
> >
> http://unicode.org/udhr/d/udhr_mal.html
>
> > Marathi
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mrt
> >
> http://unicode.org/udhr/d/udhr_mar.html
>
> > Nepali
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nep
> >
> http://unicode.org/udhr/d/udhr_nep.html
>
> > Ojibway (Ojibwe)
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ojb
> >
> http://unicode.org/udhr/d/udhr_ojb.html
>
> > Otuho
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=lot
> >
> http://unicode.org/udhr/d/udhr_lot.html
>
> > Pashto/Pakhto
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=pbu
> >
> http://unicode.org/udhr/d/udhr_pbu.html (this contains only article 1)
>
> > Sapara Atupama
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1124
> >
> http://unicode.org/udhr/d/udhr_zro.html
>
> > Saraiki
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=skr
> >
> http://unicode.org/udhr/d/udhr_skr.html
>
> > Shilluk
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=shk
> >
> http://unicode.org/udhr/d/udhr_shk.html
>
> > Shuar Chicham
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1125
> >
> http://unicode.org/udhr/d/udhr_jiv.html (this is the same URL as for
> Achuar Chicham ?LangID=1125 at the start of this list - seems like a
> duplicate entry anyhow)
>
> > Swampy Cree
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=crm
> >
> http://unicode.org/udhr/d/udhr_csw.html
>
> > Tamang (Tam)
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=taj
> >
> http://unicode.org/udhr/d/udhr_taj.html (this contains only article 1)
>
> > Tamil
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tcv
> >
> http://unicode.org/udhr/d/udhr_tam.html
>
> > Tatar
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ttr
> >
> http://unicode.org/udhr/d/udhr_tat.html
>
> > Thai
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=thj
> >
> http://unicode.org/udhr/d/udhr_tha.html
>
> > Ticuna
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tca
> >
> http://unicode.org/udhr/d/udhr_tca.html
>
> > Tsafiki
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=cof
> >
> http://unicode.org/udhr/d/udhr_cof.html (different language name - needs
> PDF verification)
>
> > Uighur
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=uig
> >
> http://unicode.org/udhr/d/udhr_uig_arab.html
> http://unicode.org/udhr/d/udhr_uig_latn.html
> (HTML has two different script variants - Arabic and Latin)
>
> > Urdu
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=urd
> >
> http://unicode.org/udhr/d/udhr_urd.html
>
> > Wao Tededo
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1127
> >
> http://unicode.org/udhr/d/udhr_auc.html
>
> > Yi
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=iii
> >
> http://unicode.org/udhr/d/udhr_iii.html ((this contains only article 1)
> > Yukagir
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=yk
>
>
> --
> mailto:alex.dupuy at mac.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/localization/attachments/20110707/77b86ce4/attachment-0001.html>


More information about the Localization mailing list