No subject
Tue Oct 28 06:28:39 EDT 2008
by John Goldsmith, The University of Chicago.
http://en.wikipedia.org/wiki/John_Goldsmith
http://hum.uchicago.edu/~jagoldsm/Webpage/index.html
The author describes one of his research interests as "unsupervised
learning of morphology". Unfortunately for us, that means unsupervised
computers attempting to analyse word structure, with a 70-80% success
rate measured by words in the corpus. It has nothing to do with human
learning or the grammar of sentences.
An algorithm for the unsupervised learning of morphology
http://hum.uchicago.edu/~jagoldsm/Papers/algorithm.pdf
Abstract
This paper describes in detail an algorithm for the unsupervised
learning of natural lan-
guage morphology, with emphasis on challenges that are encountered in
languages typolog-
ically similar to European languages. It utilizes the Minimum
Description Length analysis
described in Goldsmith (2001), and has been implemented in software
that is available for
downloading and testing.
"The executable for this program, and the source code as well, is
available at http://linguistica.uchicago.edu."
The conclusion we should draw is that in its full generality, these
are very hard problems, which we have barely made a beginning on.
I would suggest that for selected semi-formal grammars, of the kind
commonly taught in textbooks for foreign students, it might be a
reasonable task. Basically, we remove nearly all of the colloquial,
and teach only the stuff that has known rules. Given a sufficient
research impetus, we can expect to make fairly rapid progress for some
considerable time in extending our analysis and teaching methods to
ever larger corpora of published and spoken or sung material. We must
exclude at any given moment the unconquered edge cases, such as
certain kinds of obscure humor, or James Joyce and the current
post-post-modernists. I think that they can safely be left to the
advanced classes in any case. If we can get our students to be fluent
in the few hundred cases in published transformational grammars of
English, with the common irregular verbs and plurals, and a set of
daily-use idioms, nobody can complain. (Well, actually, they can, and
they will, but I will feel free to ignore them. ;->)
I'll ask Goldsmith whether he would be interested in joining our
discussion, and whether he knows of relevant R&D.
> Thanks,
>
> - Chris.
> --
> Chris Ball <cjb at laptop.org>
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
--
http://wiki.sugarlabs.org/go/User:Mokurai
Give One, Get One, from Nov. 17
http://www.amazon.com/xo
http://wiki.laptop.org/go/XO_Giving/International
More information about the Localization
mailing list