Thanks, <br><br>Oddly enough, Gonzalo came across the Unicode site a little earlier in the day and sent me the link, but I had not had time to explore it.  Thanks for the reviewing youve done.<br><br>I agree on not trusting OHCHR codes.  I think they mixed up aja and ajd (for instance).  The Bai Coca - Pai Koka pair is odd, says it was submitted by Ecuador, but I think it is a minority Chinese language.  Lots of QA to do.<br>

<br>cjl<br><br><div class="gmail_quote">On Thu, Jul 7, 2011 at 5:17 PM, Alexander Dupuy <span dir="ltr"><<a href="mailto:alex.dupuy@mac.com">alex.dupuy@mac.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Chris Leonard wrote:<br>
<br>
> Gonzalo and I have been collaborating on packaging the Universal<br>
> Declaration of Human Rights as a content bundle for the XO.<br>
><br>
> One of the things we have found is that size is an issue and as the<br>
> HTML versions are typically smaller than the PDF versions provided on<br>
> the UDHR web-site, we would prefer to use an HTML version wherever<br>
> possible.<br>
><br>
> On the UDHR website, there are a number of PDFs that are provided as<br>
> image scans and no HTML version is present to be scraped from their<br>
> web-site.<br>
><br>
<br>
Hi Chris,<br>
<br>
You may not be aware of the following website at Unicode.org (I wasn't<br>
aware of it myself until I picked one of the following languages at<br>
random and started doing some internet searching):<br>
<a href="http://unicode.org/udhr/index.html" target="_blank">http://unicode.org/udhr/index.html</a><br>
<br>
While not all of the languages on your list are past the initial (sh =<br>
status & history) stage, several of them have XML translations in<br>
Unicode, which could be easily converted into HTML documents without any<br>
special resources (other than appropriate Unicode fonts, I suppose, to<br>
avoid inadvertent errors in the editing process).<br>
<br>
There are also some (about a dozen) languages on the Unicode site that<br>
are not listed on the OHCHR site (e.g. Cherokee), including a handful of<br>
alternate scripts, e.g. Traditional Chinese and Hanh Mongolian script.<br>
<br>
Note that the languages codes used don't correspond - I would advise<br>
standardizing on ISO language_country@script codes for any filenames<br>
(Unicode seems better that way than the OHCHR, but they still use<br>
alternate codes, e.g. for Dari, which OLPC/Sugar have as fa_AF).<br>
<br>
The translations are not the same for some of these languages; however,<br>
they do appear to be in the correct languages and correspond at least<br>
roughly.  I imagine that a lot of the variance is due to trying (at<br>
different levels of effort) to explain the somewhat abstract ideas in<br>
the UDHR into easily understood vernacular speech.<br>
<br>
> Achuar Chicham<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=jiv" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=jiv</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_jiv.html" target="_blank">http://unicode.org/udhr/d/udhr_jiv.html</a> (same language, different<br>
translation)<br>
<br>
> Awapit<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kwi" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kwi</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_kwi.html" target="_blank">http://unicode.org/udhr/d/udhr_kwi.html</a> (same language, different<br>
translation)<br>
<br>
> Bai Coca<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1121" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1121</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_snn.html" target="_blank">http://unicode.org/udhr/d/udhr_snn.html</a> (text appears 100% identical)<br>
<br>
> Pai Koka<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1123" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1123</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_sey.html" target="_blank">http://unicode.org/udhr/d/udhr_sey.html</a> (text appears 100% identical)<br>
<br>
This language appears to be extremely similar to Bai Coca above, except<br>
for orthography - notably c/qu -> k, which is typical for SIL -><br>
governmental orthographic changes in Spanish-speaking countries, e.g.<br>
ALMG official orthography in Guatemala vs. Summer Institute for<br>
Linguistics orthography for Bible translations. If space is an issue<br>
here, it might make sense to just pick up one of these, and even if you<br>
include both, note them as one language with two orthographic variants,<br>
like Serbian Latin v. Cyrillic<br>
<br>
> Bengali<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=bng" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=bng</a><br>
><br>
<br>
<a href="http://unicode.org/udhr/d/udhr_ben.html" target="_blank">http://unicode.org/udhr/d/udhr_ben.html</a> (appears to be identical based<br>
on a few spot checks - I can't read Bengali at all, so this is just<br>
"squiggle-compatible" check)<br>
<br>
> Chaa'pala<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1122" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1122</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_cbi.html" target="_blank">http://unicode.org/udhr/d/udhr_cbi.html</a> (text appears 100% identical)<br>
<br>
> Crioulo da Guin?-Bissau (Guinea Bissau Creole)<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gbc" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gbc</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_pov.html" target="_blank">http://unicode.org/udhr/d/udhr_pov.html</a> (text appears 100% identical)<br>
<br>
> Croatian<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=src2" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=src2</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_hrv.html" target="_blank">http://unicode.org/udhr/d/udhr_hrv.html</a> (text appears 100% identical)<br>
<br>
> Dari<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs1" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs1</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_pes_2.html" target="_blank">http://unicode.org/udhr/d/udhr_pes_2.html</a> (appears pretty similar, but<br>
although my Arabic script is better than my Bengali, the terrible<br>
quality of the PDF scan makes it hard to be sure)<br>
<br>
> Dine, Navajo (Navaho)<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nav" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nav</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_nav.html" target="_blank">http://unicode.org/udhr/d/udhr_nav.html</a> (text appears 100% identical)<br>
<br>
> Dzongkha/Bhutanese<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=dzo" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=dzo</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_dzo.html" target="_blank">http://unicode.org/udhr/d/udhr_dzo.html</a> (this contains only article 1)<br>
<br>
> Even<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eve" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eve</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_eve.html" target="_blank">http://unicode.org/udhr/d/udhr_eve.html</a> (text appears 100% identical -<br>
both are missing preamble, HTML version explicitly so)<br>
<br>
> Farsi/Persian<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=prs</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_pes_1.html" target="_blank">http://unicode.org/udhr/d/udhr_pes_1.html</a> (appears very similar, but<br>
again, the poor quality of the PDF scan makes it hard to be sure)<br>
<br>
<br>
> Gujarati<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gjr" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=gjr</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_guj.html" target="_blank">http://unicode.org/udhr/d/udhr_guj.html</a> (seems pretty similar,<br>
squiggle-compatible)<br>
<br>
> Hebrew<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hbr" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hbr</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_heb.html" target="_blank">http://unicode.org/udhr/d/udhr_heb.html</a> (text appears 100% identical)<br>
<br>
> Inuktitut<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=esb" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=esb</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_ike.html" target="_blank">http://unicode.org/udhr/d/udhr_ike.html</a> (text appears 100% identical)<br>
<br>
> Kannada<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kjv" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kjv</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_kan.html" target="_blank">http://unicode.org/udhr/d/udhr_kan.html</a> (text appears 100% identical)<br>
<br>
--- At this point I ran out of time to do more than download the HTML<br>
version - grabbing the PDFs from the OHCHR site was just too slow) ---<br>
<br>
> Kazakh<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kaz" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=kaz</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_kaz.html" target="_blank">http://unicode.org/udhr/d/udhr_kaz.html</a><br>
<br>
> Kichwa<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=qug" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=qug</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_qug.html" target="_blank">http://unicode.org/udhr/d/udhr_qug.html</a><br>
<br>
<br>
> Malayalam<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mjs" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mjs</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_mal.html" target="_blank">http://unicode.org/udhr/d/udhr_mal.html</a><br>
<br>
> Marathi<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mrt" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mrt</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_mar.html" target="_blank">http://unicode.org/udhr/d/udhr_mar.html</a><br>
<br>
> Nepali<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nep" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nep</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_nep.html" target="_blank">http://unicode.org/udhr/d/udhr_nep.html</a><br>
<br>
> Ojibway (Ojibwe)<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ojb" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ojb</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_ojb.html" target="_blank">http://unicode.org/udhr/d/udhr_ojb.html</a><br>
<br>
> Otuho<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=lot" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=lot</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_lot.html" target="_blank">http://unicode.org/udhr/d/udhr_lot.html</a><br>
<br>
> Pashto/Pakhto<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=pbu" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=pbu</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_pbu.html" target="_blank">http://unicode.org/udhr/d/udhr_pbu.html</a> (this contains only article 1)<br>
<br>
> Sapara Atupama<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1124" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1124</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_zro.html" target="_blank">http://unicode.org/udhr/d/udhr_zro.html</a><br>
<br>
> Saraiki<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=skr" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=skr</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_skr.html" target="_blank">http://unicode.org/udhr/d/udhr_skr.html</a><br>
<br>
> Shilluk<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=shk" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=shk</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_shk.html" target="_blank">http://unicode.org/udhr/d/udhr_shk.html</a><br>
<br>
> Shuar Chicham<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1125" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1125</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_jiv.html" target="_blank">http://unicode.org/udhr/d/udhr_jiv.html</a> (this is the same URL as for<br>
Achuar Chicham ?LangID=1125 at the start of this list - seems like a<br>
duplicate entry anyhow)<br>
<br>
> Swampy Cree<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=crm" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=crm</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_csw.html" target="_blank">http://unicode.org/udhr/d/udhr_csw.html</a><br>
<br>
> Tamang (Tam)<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=taj" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=taj</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_taj.html" target="_blank">http://unicode.org/udhr/d/udhr_taj.html</a> (this contains only article 1)<br>
<br>
> Tamil<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tcv" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tcv</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_tam.html" target="_blank">http://unicode.org/udhr/d/udhr_tam.html</a><br>
<br>
> Tatar<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ttr" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=ttr</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_tat.html" target="_blank">http://unicode.org/udhr/d/udhr_tat.html</a><br>
<br>
> Thai<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=thj" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=thj</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_tha.html" target="_blank">http://unicode.org/udhr/d/udhr_tha.html</a><br>
<br>
> Ticuna<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tca" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=tca</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_tca.html" target="_blank">http://unicode.org/udhr/d/udhr_tca.html</a><br>
<br>
> Tsafiki<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=cof" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=cof</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_cof.html" target="_blank">http://unicode.org/udhr/d/udhr_cof.html</a> (different language name - needs<br>
PDF verification)<br>
<br>
> Uighur<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=uig" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=uig</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_uig_arab.html" target="_blank">http://unicode.org/udhr/d/udhr_uig_arab.html</a><br>
<a href="http://unicode.org/udhr/d/udhr_uig_latn.html" target="_blank">http://unicode.org/udhr/d/udhr_uig_latn.html</a><br>
(HTML has two different script variants - Arabic and Latin)<br>
<br>
> Urdu<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=urd" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=urd</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_urd.html" target="_blank">http://unicode.org/udhr/d/udhr_urd.html</a><br>
<br>
> Wao Tededo<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1127" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=1127</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_auc.html" target="_blank">http://unicode.org/udhr/d/udhr_auc.html</a><br>
<br>
> Yi<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=iii" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=iii</a><br>
><br>
<a href="http://unicode.org/udhr/d/udhr_iii.html" target="_blank">http://unicode.org/udhr/d/udhr_iii.html</a> ((this contains only article 1)<br>
> Yukagir<br>
> <a href="http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=yk" target="_blank">http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=yk</a><br>
<font color="#888888"><br>
<br>
--<br>
mailto:<a href="mailto:alex.dupuy@mac.com">alex.dupuy@mac.com</a><br>
<br>
</font></blockquote></div><br>