[OLPC library] Dictionary Software Analysis

LuYu luyufreeculture at gmail.com
Mon Mar 17 12:39:08 EDT 2008


    OLPC's Dictionary -- A Software Review

StarDict has been chosen by the OmegaWiki and OLPC projects as the
program to display and facilitate querying of dictionary data. Over the
last few weeks, I have taken a look at the documentation and played with
StarDict and compared it with ZBEDic <http://bedic.sourceforge.net/>
(with which I have begun to be very impressed recently). This comparison
has led me to believe that both programs should be ported to Sugar (or
combined into a new OLPC dictionary) and that both programs have
different strengths and weaknesses. A comparison of these programs is,
therefore, certain to benefit both OLPC and Free Software users everywhere.


    StarDict -- An Overview

StarDict is, in typical *nix fashion, a very versitile and useful
program. It supports Dict.org style dictionaries and can load and search
multiple dictionaries simultaneously and can display definitions from
many dictionaries simultaneously.

It is fast. Even with ten or more dictionaries loaded, there appears to
be no degradation in response time. Amazingly, as well, it has a "scan"
function which defines any word highlighted with the mouse. This is the
closest thing I have yet seen on any platform to Dr. Eye
<http://www.dreye.com/en/> (one of the few proprietary software programs
I would truly claim is awesome. It has a feature that automagically
defines -- in two languages! -- any text on the desktop that is moused
over in popup windows).


    StarDict's Drawbacks

Given all that, you are probably downloading and installing StarDict as
you read this. However, StarDict is also classically *nix in its glaring
weaknesses.


      Philosophy

Part of this has to do with the Dict.org philosophy which leans toward
an always connected mentality, as opposed to the on again off again
mindset imposed by the early internet. Unfortunately, the OLPC is most
likely going to operate in the latter environment -- or, at the very
least, must be designed to. Dict.org is designed for experienced system
administrators, certainly not for children. The dictionary format that
StarDict uses seems to require at least the completion of an
undergraduate comp-sci degree to implement.  So much for kids adding or
creating their own dictionaries. If this were not enough, there is no
graphical or even user level method to add dictionaries.


      No Included Dictionaries

No dictionaries -- not even Public Domain ones -- come with the StarDict
packages, and there are no Debian packages of dictionaries. As far as I
know, there are no Fedora/Redhat packages, either.  Adding dictionaries
requires */root access/* to the system. This really is an absurd
requirement for a childrens' computer -- it is even absurd for a desktop
computer (in fact this requirement was so onerous that until a month
ago, I have given up every one of many attempts to use this software
over the last few years).


      Website

The StarDict website is also nearly impossible to navigate. I had to use
an anonymizer just to get Google to let me complete the search and show
me the dictionary list (For some reason, clicking the dictionaries link
brings up a page with a Google search. Searching causes Google to accuse
the user of being a bot and doing something illicit.  Maybe it is just
my connection, but why should I have to use Google to find a page of
dictionaries within the site I was already visiting?).


      Documentation

StarDict's documentation is not exactly comprehensive. For compiling
dictionaries, there appear to be exactly two text files: one describing
the StarDict dictionary format in programmerese
<http://web.archive.org/web/20070605151926/http://stardict.sourceforge.net/DICTFILE_FORMAT>,
and another containing terse tips on compiling dictionaries
<http://stardict.sourceforge.net/HowToCreateDictionary>. Both are
practically unreadable to a non-programmer. This is to say nothing of
documentation on how to use the program itself -- the program is of
course quite complicated and not always intuitive.


      Failure to Read Archives

In addition to the above problems, downloaded dictionaries have to be
unzipped to work. Each archive contains a folder with three files.
StarDict can read the dictzip compressed archive (.dict.dz), but not
.tar.bz2 archive containing all three files. I find it absurd that a
program that requires compression cannot read from a larger archive
containing three files. This extra step is probably assumed to be easy
for seasoned system administrators but is hardly so for ordinary users
-- especially ones that are in the process of learning.  The .tar.bz2
archives should not have to be extracted, or if compression will slow
down the program too much, an uncompressed archive (i.e. just .tar or
something similar) should be used since the dictionaries are compressed
anyway.


      Ugly Display

StarDict's tooltip and windowed definition presentation is cludgy.
Instead of treating paragraphs as a lump, it renders lines individually
indenting new lines giving the impression of random placement -- some
are indented, some are not. This is not only ugly but also makes reading
quite difficult, especially in the tooltip display.


      Selection Problems

The scan feature requires text to be highlighted carefully by hand and
does not automagically use spaces to separate words. As a result,
StarDict often attempts to look up word fragments or even multiple word
fragments. As a word is being highlighted, it attempts to define almost
every group of letters until the word is completed. In this way, the
program's speed actually works against it making highlighting more
difficult. Then again, this feature also makes phrase lookups possible.


      No Non-English Dictionaries

For some reason, there are no non-English dictionaries on the list at
Stardict's site
<http://stardict.sourceforge.net/Dictionaries_dictd-www.dict.org.php>.
There appear to be many text based dictionary files in other languages
on other parts of the web, but they are sprinkled on many different
sites and hard to find. These files need to be consolidated so that they
can easily be found without having to weed through thousands of
dictionary websites and Amazon like sales entries.


    ZBEDic -- An Overview

ZBEDic is a really cool dictionary program for the Zaurus. It is small,
fast and integrates well with programs like FBReader (which has been
ported to Sugar). FBReader has a feature that allows automagic lookups
of any word pressed on the touch screen. Clicking on a special icon
returns one to the book being read. It supports multiple dictionaries,
and the definitions are nicely formatted. Further, it uses a human
readable dictionary format. The dictionary files can be stored anywhere
and added with the included file manager functionality (one need not be
root to add dictionaries). ZBEDic's definition presentation is pretty
and very readable. It not only displays paragraphs nicely but also
separates multiple definitions with blank lines so the reader does not
have to do extra work mentally separating the elements. It also displays
redundant definitions of a single word.


    ZBEDic's Drawbacks

While ZBEDic can access a collection of dictionaries, it can only use
one at a time. This means the only way to have multiple languages is to
compile a dictionary that way or to have two dictionaries.


      No Automagic Swapping or Dictionary Combining

Dictionaries must be swapped by hand. There is no reverse lookup within
a dictionary. In other words, checking the words contained in a
definition in another language or dictionary requires changing
dictionaries.


      Unidirectional Bilingual Limitation

These two points also make the dictionary unidirectional. An
English-Spanish dictionary will not do Spanish-English lookups. All of
these problems could, in theory, be solved in the dictionary files (i.e.
by compiling a dictionary that contained both English-Spanish and
Spanish-English entries and probably interpolated with entries from both
languages mixed freely and alphabetically), but such a solution would be
difficult to use (to say the least), and I have neither seen nor heard
of such a dictionary file. This problem is somewhat mitigated by the
fact that switching dictionaries does not affect the input field, so one
can display the definitions in as many languages as one might want
without going to the trouble of reentering the query item -- assuming,
of course, the entry happens to exist in multiple languages.
Fortunately, the lack of an entry in a given dictionary does not stomp
the input field, so one need not fear accidentally loading the wrong
dictionary.


      Affixes Cause Lookups to Fail

The book integration is nice. Any single word can be clicked resulting
in the dictionary coming up with a definition. However, the authors
failed account for affixes. This means plural words or conjugated words
result in no definition. Switching the virtual keyboard on or off (on
the Zaurus, anyway) triggers a return to the program that initiated the
lookup.


      Limited to Supporting Programs

This feature, as implemented, is also specific to programs that support
it and has not been generalized to the entire desktop (then again, touch
screen devices really do not have hover capabilities, so this question
may be moot -- once again, though, the OLPC does have a pointer device).


      One Touch Lookup Not Available in Dictionary Itself

The one touch lookup is not available within definitions in ZBEDic
itself, although highlighted lookup is, but a special lookup icon/button
must be added to the menubar for this to work.


      Many Dictionaries Too Small

ZBEDic's website
<http://bedic.sourceforge.net/dict-list-keyword-lang.html> has a lot of
dictionaries available and many appear to be quite nice. Many of the
bilingual dictionaries, however, contain very few entries. While normal
dictionaries contain 30 to 40,000 words, some of these dictionaries have
a mere 5,000 entries. These are clearly insufficient. Bilingual
dictionaries with at least 30,000 entries must exist in the Public
Domain for (at the very least) every combination of European languages
and many others.


    Problems with/Shortcomings of Both Programs


      Desktop Integration

Neither Stardict's scan function nor ZBEDic's book integration work for
all text on the desktop (unlike Dr. Eye), and neither allows hover
lookups. Both StarDict's scan and ZBEDic's book integration could be
improved (see above).


      Compatability Lacking

Dictionaries compiled for one of the programs do not work with the other
program. StarDict does not recognise the existence of a .dict.dz file
outside of a folder and without the accompanying index files, and ZBEDic
gives the error "Database error:entry too long" after attempting and
failing to load StarDict's dictionary files. Both of these programs
should have the capability to read each other's dictionaries or a
mutually compatible format should be created (after all, this is Free
Software, not MacroSuck's proprietary universe -- and standards are
always nice ;-).

Now that I have more or less stated a lot about what I think the
software should be capable of and gotten most of my opinions out in the
open, I should talk about where I think this should be headed in the
short term.

As I have stated before, I think it is possible to obtain Public Domain
dictionaries in practically every major language spoken today and have
multilingual capabilities between many of them by this summer.  I am
hoping it will be possible to write Perl and/or Python scripts to easily
convert text files to dict.dz files by that time (however, if it I am
alone in this endeavor, it might take considerably longer as I am
generally not the best programmer around).

In any case, I will do what I can, and I hope these comments lead to an
improvement in this overall situation.  While the software is not bad,
the difficulty in obtaining dictionaries is a problem.  For once,
copyright is not the major hurdle, so I hope it will be possible to
rapidly compile a large collection of good dictionaries for the good of
children and curious people everywhere.

Sincerely,

LuYu 	

"How a society produces its information environment goes to the very
core of freedom."


	-- Yochai Benkler

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/library/attachments/20080318/1fbe8e92/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: egyptian-cup.gif
Type: image/gif
Size: 4412 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/library/attachments/20080318/1fbe8e92/attachment-0001.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeculture-icon.png
Type: image/png
Size: 6024 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/library/attachments/20080318/1fbe8e92/attachment-0001.png 


More information about the Library mailing list