[sugar] An Update about Speech Synthesis

Tue Feb 19 13:13:44 EST 2008

Hi,

I'd like to see an eSpeak literacy project written up -- Once we have
> a play button, with text highlighting, we have most of the pieces to
> make a great read + speak platform that can load in texts and
> highlight words/sentences as they are being read.  Ping had a nice
> mental model for this a while back.

Great idea :). The button will soon be there :D. I had never expected this
to turn into something this big :). There are lots of things I want to get
done wrt this project and hope to accomplish them one by one.

Thanks for the info Hemant!  Can you tell me more about your experiences
> with speech dispatcher and which version you are using?  The things I'm
> interested in are stability, ease of configuration, completeness of
> implementation, etc.

I'll try to tell whatever I am capable of explaining (I am not an expert
like you all :) ). Well we had initially started out with a speech-synthesis
DBUS API that directly connected to eSpeak. Those results are available on
the wiki page [http://wiki.laptop.org/go/Screen_Reader]. From that point
onwards we found out about speech-dispatcher and decided to analyze it for
our requirements primarily keeping the following things in mind:

   1. An API that provided configuration control on a per-client basis.
   2. a feature like printf() but for speech for developers to call, and
   thats precisely how Free(b)soft described their approach to
   speech-dispatcher.
   3. Python Interface for speech-synthesis
   4. Callbacks for developers after certain events.

At this moment I am in a position to comment about the following:

   1. WRT which modules to use -I found it extremely easy to configure
   speech-dispatcher to use eSpeak as a TTS engine. There are configuration
   files available to simply select/unselect which TTS module needs to be used.
   I have described how an older version of speech-dispatcher can be made to
   run on the XO here
   http://wiki.laptop.org/go/Screen_Reader#Installing_speech-dispatcher_on_the_xo
   2. There were major issues of using eSpeak with the ALSA Sound system
   some time back [http://dev.laptop.org/ticket/5769,
   http://dev.laptop.org/ticket/4002]. This issue is resolved by using
   speech-dispatcher as it supports ALSA, and OSS. So in case OLPC ever shifts
   to OSS we are safe. I am guessing speech-dispatcher does not directly let a
   TTS engine write to a sound device but instead accepts the audio buffer and
   then routes it to the Audio Sub System.
   3. Another major issue we had to tackle was providing callbacks while
   providing the DBUS interface. The present implementation of
   speech-dispatcher provides callbacks for various events that are important
   wrt speech-synthesis. I have tested these out in python and they were
   working quite nicely. In case you have not, you might be interested in
   checking out their Python API [
   http://cvs.freebsoft.org/repository/speechd/src/python/speechd/client.py?hideattic=0&view=markup
   ].
   4. Voice Configuration and language selection - The API provides us
   options to control voice parameters such as pitch, volume, voice etc for
   each client.
   5. Message Priorities and Queuing - speech-dispatcher has provided
   various levels of priority for speech synthesis, so we cand place a Higher
   Priority to a message played by Sugar as compared to an Activity.
   6. Compatibility with orca - I installed orca and used
   speech-dispatcher as the speech synth engine. It worked fine. We wanted to
   make sure that the speech synth server would work with orca if it was ported
   to XO in the future.
   7. Documentation - speech-dispatcher has a lot of documentation at the
   moment, and hence its quite easy to find our way and figure out how to do
   things we really want to. I had intended to explore gnome-speech as well,
   however the lack of documentation and examples turned me away.

The analysis that I did was mostly from a user point of view or simple
developer requirements that we realized had to be fulfilled wrt
speech-synthesis, and it was definitely not as detailed as you probably
might expect from me.

We are presently using speech-dispatcher 0.6.6

A dedicated eSpeak module has been provided in the newer versions of
speech-dispatcher and that is a big advantage for us. In the older version
eSpeak was called and various parameters were passed as command line
arguments, it surely was not very efficient wrt XO.

Stability - I think the main point that I tested here was how well
speech-dispatcher responds to long strings. The latest release of
speech-dispatcher 0.6.6 has some
tests in which an entire story is read out [
http://cvs.freebsoft.org/repository/speechd/src/tests/long_message.c?view=markup].
However I still need to run this test on the XO. I will do so once I have
RPM packages to install on the XO.

In particular speech-dispatcher is quite customizable, easily controlled
through programming languages, provides callback support, and has
specialized support for eSpeak that makes it a good option for the XO.

All in all speech-dispatcher is very promising for our requirements wrt XO.
While I am not able to project all possible problems that will come wrt
speech-synthesis at this stage, it is the best option that is available at
present as opposed to our original plans of providing a DBUS API :P. I am
preparing myself to possibly delve deeper and test speech-dispatcher
0.6.6on the XO once its RPMs are accepted by Fedora Community. As we
progress I
will surely find out limitations of speech-dispatcher and would surely
report them and/or help fix them along with the Free(b)Soft team.

I hope you find this useful, I can try to answer a more specific question.

Thanks!
Hemant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/sugar/attachments/20080219/e5b4363a/attachment.htm