[sugar] Develop activity (Oops...)
Marc-Antoine Parent
maparent at gmail.com
Mon Aug 13 16:40:53 EDT 2007
Good day, all!
Finally had time to read and think a bit, especially Jameson's design.
(For the record: Andrew kindly included me in your discussion,
because I have worked on a few multilingual projects in the past, and
got to think about these issues somewhat.)
I like many aspects of the design a lot; especially the idea that
the .py files should be in English as much as possible, with
translation on load and save. This has two obvious advantages, namely
a) we do not have to store translation between a matrix of languages,
but we are allowed a simpler hub-and-spoke model with English as the hub
b) the current interpreter can work without knowing about all this.
However, Jameson, if I may, I would take issue with a few assumptions
of your model: most especially that of a "preferred language" for
modules.
I am referring to your point 3:
> 3-This dictionary ONLY contains translations for the "public
> interface" of somemodule.py, that is, those identifiers which are
> used in importer modules. It also defines a single, unchanging
> "preferred language" for that file, which is the assumed language
> for all non-translated identifiers in that file.
I am especially interested in collaborative work; and I believe it is
not unreasonable to hope that children between schools in different
countries will get to share some work.
That would mean that a given modules may have many editors, possibly
introducing identifiers in more than one non-English language.
From that point of view, "preferred language" is a feature of an
editing environment, not of a module. New identifiers should be
individually tagged by language; I see that tagging as appropriate
work for the editing environment. Basically, upon loading the file,
all local identifiers would be read in memory; upon saving, new ones
would be saved with a language tag. (Plausibly as a postfix,
Identifier_i18n_2letterLanguageCode...)
(I would otherwise follow Mike's suggestion to use a fixed
transliteration table for non-latin scripts.)
This only applies until we have a valid English version of the
identifier, of course; at that point, it will serve as the hub. But
that raises another issue, which you tackle in point 5 and 6: what
happens with imports in other modules that use the old generated
identifier? You suggest keeping a separate history. It is a
possibility, but I fear it goes counter to the goal of making the
files usable by the existing interpreter. (Though you may have
thought of a workaround this that I have missed.)
My suggestion would be as follows:
(I will use French for my example.)
in premier_module.py:
def une_fonction__i18n_fr: ...
EOF
in deuxieme_module.py:
du premier_module importe une_fonction__i18n_fr
...
EOF
Then, the translation a_function is introduced for une_fonction...
So premier_module.py becomes:
def a_function: ...
# -*- Translation history block -*-
une_fonction__i18n_fr = a_fonction
# -*- End translation history block -*-
EOF
(N.B. 1: The translation history block could be hidden in a
knowledgeable editor; but we should have access to it, so as to
explain why that word is still reserved.)
(N.B. 2: Actually, it is likely that premier_module.py has been
renamed to first_module.py, and the package's __init__.py has a
similar equivalence in _its_ translation block!!!)
That way, the original import in deuxieme_module still works, in an
unmodified python interpreter.
(Until the knowledgeable editor gets to work on deuxieme_module.py
again, of course.)
Even if someone decides on a better translation later on, more than
one version may be kept in the translation block.
This has the disadvantage of polluting the code, but the advantage of
polluting the filesystem less.
I am realizing a broader application of this mechanism: the
translation block could be tagged with a revision number (if
__revision__<540:), and the "import" command could mention the last
known revision; so translation blocks would only be activated at
need. But that's all another story.
Another quick related note: What if someone adds a translation
between two non-English languages? In your first email, you
explicitly forbid it; I am not sure that is necessary. (I am not sure
you think of it as necessary in your later design as well.) Clearly,
however, X to Y translations may have to refer to the history (as
language X is replaced by English) so as to become English to Y.
To finish with your design points, you introduce what I see as a
severe limitation in your point 4:
> 4-There is good UI support for creating a new translation for a
> word. However, the assumed user model is that words will be
> translated INTO a users preferred language; FROM the context of an
> importer module (you'd generally not add translations for a module
> from that module itself, since generally you wouldn't even have
> modules open whose preferred language is not your own); and
> therefore WITH an explicit user decision as to which module this
> translation belongs in (they want to use their language for
> identifier X which is in English, well, they must have had a reason
> to write it in English rather than their language so they
> presumably know what imported module it comes from.)
What really made me jump is the notion that "you wouldn't even have
modules open whose preferred language is not your own". Again, this
assumes a single preferred language per module, which is something I
would rather avoid, and I believe is not necessary if identifiers
have a language mark.
However, I suspect your mention of "from the context of an importer
module" comes from the issues you encountered with memorizing the
import structure. I would like to hear more about the problems you
ran into there, because I believe it is necessary (for reasons to be
detailed below.)
Now, a few suggestions and pitfalls of my own:
a) I believe there should be one translation file per language. More
file pollution, less parsing.
I suspect that something akin to the getinfo file structure would be
appropriate:
package/module1.py
would be translated in
package/_t9n_/fr/module1.pyt
package/_t9n_/fr/module1.pyto (object, like a .mo file)
package/_t9n_/es/module1.pyt
package/_t9n_/es/module1.pyto
and so on.
b) A particularly fancy editor would color-code words in other
languages instead of showing the _i18n_xx tag.
Of course there would be a way to access online translation services
to get suggestions (as has been suggested by many.)
c) Sci-fi scenario: any new translation suggestion by a child or
educator should be made available to others using a distributed
database system... (they are likely to work on common projects, and
hence on common modules.)
The children educators known to be knowledgeable about a given
language pair should have a way to vet translations in that database.
Oh, and let's send it to planet python so we have a basis to build
the translation files to the standard library for very obscure
languages ;-)
(OK, that _is_ sci-fi. Still worth thinking about!)
d) Back to earth: I said we really had to know the import
structure... here is a slew of related problems:
Suppose we are editing a module that is importing something from the
core library:
from moduleX import f1
from moduleZ import f2
f1()
f2()
Now, suppose f1 and f2 both translate as "sigma" in the current
editing language... Then, though the .py code is unambiguous, the
translated on-screen code looks ambiguous; and worse, the un-
translation process on save is not well-defined.
The solution is to actually un-specify the imports in the source code:
import moduleX
import moduleY
moduleX.sigma()
moduleY.sigma()
This refactoring should be possible in most cases, unless two top-
level modules have similar translations. (say moduleX and module Y
both translate as "modula")
This situation should be marked as an error; or alternately _display_
the following:
import moduleX__i18n_en_ as modula_1
import moduleY__i18n_en_ as modula_2
modula_1.sigma()
module_2.sigma()
This is not an interpreter-level change, but a disguised display. (or
rather a refactoring which can be memorized, and reverted by the
untranslation machinery.)
Note that display-only import disambiguation may also be necessary if
the above code happens in a core library file (which we would never
modify.)
In any case it is useful to flag as an error any translation that
introduces ambiguity within the same namespace.
Similar transformations may be made necessary by "from moduleX import
*" syntax.
None of this is simple, as I said; but alas probably necessary.
e) Would we display numbers as the equivalent numerics in other
writing systems?
f) Docstrings... are another issue entirely. I still like my idea of
a distributed database, so children puzzling out a foreign (to them)
docstring with online help can put their minds together.
OK, I am giving more problems than solutions, here; and
unfortunately, my spare time is otherwise quite occupied, so I doubt
I can contribute to implementation; still, I hope that spelling some
of these things out is useful to others. I'll try to keep my thinking
cap on as this discussion evolves.
Cheers,
Marc-Antoine Parent
http://maparent.ca/
P.S. I _love_ your idea of arrows in the margin to indicate flow!
More information about the Sugar
mailing list