[sugar] An Update about Speech Synthesis

Tue Feb 19 19:45:05 EST 2008

Hemant and James,

Can you write something about this at a [[spoken texts]] page on the
wiki ('hear and read'?  some other more creative name... )?  The
Google Literacy Project is highlighting a number of literacy efforts
for the upcoming World Book Day, and your work would be fine
suggestions for that list.

SJ

On Feb 19, 2008 1:13 PM, Hemant Goyal <goyal.hemant at gmail.com> wrote:
>
> Hi,
>
>
> > I'd like to see an eSpeak literacy project written up -- Once we have
> > a play button, with text highlighting, we have most of the pieces to
> > make a great read + speak platform that can load in texts and
> > highlight words/sentences as they are being read.  Ping had a nice
> > mental model for this a while back.
>
>
> Great idea :). The button will soon be there :D. I had never expected this
> to turn into something this big :). There are lots of things I want to get
> done wrt this project and hope to accomplish them one by one.
>
> > Thanks for the info Hemant!  Can you tell me more about your experiences
> > with speech dispatcher and which version you are using?  The things I'm
> > interested in are stability, ease of configuration, completeness of
> > implementation, etc.
>
>
> I'll try to tell whatever I am capable of explaining (I am not an expert
> like you all :) ). Well we had initially started out with a speech-synthesis
> DBUS API that directly connected to eSpeak. Those results are available on
> the wiki page [http://wiki.laptop.org/go/Screen_Reader]. From that point
> onwards we found out about speech-dispatcher and decided to analyze it for
> our requirements primarily keeping the following things in mind:
>
>
> An API that provided configuration control on a per-client basis.
> a feature like printf() but for speech for developers to call, and thats
> precisely how Free(b)soft described their approach to speech-dispatcher.
> Python Interface for speech-synthesis
> Callbacks for developers after certain events.
>  At this moment I am in a position to comment about the following:
>
>
> WRT which modules to use -I found it extremely easy to configure
> speech-dispatcher to use eSpeak as a TTS engine. There are configuration
> files available to simply select/unselect which TTS module needs to be used.
> I have described how an older version of speech-dispatcher can be made to
> run on the XO here
> http://wiki.laptop.org/go/Screen_Reader#Installing_speech-dispatcher_on_the_xo
>
> There were major issues of using eSpeak with the ALSA Sound system some time
> back [http://dev.laptop.org/ticket/5769, http://dev.laptop.org/ticket/4002].
> This issue is resolved by using speech-dispatcher as it supports ALSA, and
> OSS. So in case OLPC ever shifts to OSS we are safe. I am guessing
> speech-dispatcher does not directly let a TTS engine write to a sound device
> but instead accepts the audio buffer and then routes it to the Audio Sub
> System.
>
> Another major issue we had to tackle was providing callbacks while providing
> the DBUS interface. The present implementation of speech-dispatcher provides
> callbacks for various events that are important wrt speech-synthesis. I have
> tested these out in python and they were working quite nicely. In case you
> have not, you might be interested in checking out their Python API
> [http://cvs.freebsoft.org/repository/speechd/src/python/speechd/client.py?hideattic=0&view=markup].
> Voice Configuration and language selection - The API provides us options to
> control voice parameters such as pitch, volume, voice etc for each client.
> Message Priorities and Queuing - speech-dispatcher has provided various
> levels of priority for speech synthesis, so we cand place a Higher Priority
> to a message played by Sugar as compared to an Activity.
>
> Compatibility with orca - I installed orca and used speech-dispatcher as the
> speech synth engine. It worked fine. We wanted to make sure that the speech
> synth server would work with orca if it was ported to XO in the future.
> Documentation - speech-dispatcher has a lot of documentation at the moment,
> and hence its quite easy to find our way and figure out how to do things we
> really want to. I had intended to explore gnome-speech as well, however the
> lack of documentation and examples turned me away.
>  The analysis that I did was mostly from a user point of view or simple
> developer requirements that we realized had to be fulfilled wrt
> speech-synthesis, and it was definitely not as detailed as you probably
> might expect from me.
>
> We are presently using speech-dispatcher 0.6.6
>
> A dedicated eSpeak module has been provided in the newer versions of
> speech-dispatcher and that is a big advantage for us. In the older version
> eSpeak was called and various parameters were passed as command line
> arguments, it surely was not very efficient wrt XO.
>
> Stability - I think the main point that I tested here was how well
> speech-dispatcher responds to long strings. The latest release of
> speech-dispatcher 0.6.6 has some
> tests in which an entire story is read out
> [http://cvs.freebsoft.org/repository/speechd/src/tests/long_message.c?view=markup].
> However I still need to run this test on the XO. I will do so once I have
> RPM packages to install on the XO.
>
> In particular speech-dispatcher is quite customizable, easily controlled
> through programming languages, provides callback support, and has
> specialized support for eSpeak that makes it a good option for the XO.
>
>  All in all speech-dispatcher is very promising for our requirements wrt XO.
> While I am not able to project all possible problems that will come wrt
> speech-synthesis at this stage, it is the best option that is available at
> present as opposed to our original plans of providing a DBUS API :P. I am
> preparing myself to possibly delve deeper and test speech-dispatcher 0.6.6
> on the XO once its RPMs are accepted by Fedora Community. As we progress I
> will surely find out limitations of speech-dispatcher and would surely
> report them and/or help fix them along with the Free(b)Soft team.
>
> I hope you find this useful, I can try to answer a more specific question.
>
> Thanks!
> Hemant
>
>