More 'human' voice synth (TTS)

Paul Fox pgf at
Tue Jun 21 09:43:02 EDT 2011

sridhar wrote:
 > I'm wondering if there's anything we can do to make TTS sound more
 > 'human'. We'd like to be able to use the XOs to teach English
 > literacy, but the espeak voices are very robotic.
 > My understanding is that espeak is optimised for low-power devices
 > (great for XOs) and clear (if robotic) speech. Would it be feasible to
 > switch to something else, like festival?

i've run festival as part of my home automation system for many many
years, including the last 3 or so on an XO-1 (debxo) which acts as my
current HA server.

the first secret is to run it in client/server mode, to avoid the
server startup latency on every enunciation.  but even after that, i
think the latency will be too high for your application.  i just
tested it:  given a moderate english sentence, it took 3 seconds to
produce output.  (i hide this on my system by caching utterances --
that's more feasible in a menuing system than when teaching literacy.)   (5 seconds on XO-1)

flite is a lower cost version of festival that might be appropriate.
it seems to reduce the conversion time to about half a second.
but the quality suffers as well.   (.5 seconds on XO-1)

fyi, current festival server process footprint:
root       999  0.0  9.4  26668 20004 ?        Ss   Jun06  10:03 /usr/bin/festival --server /usr/local/etc/nosil.scm

i haven't used espeak -- i suspect there are API interfaces that are
far richer than what i'm doing from the shell commandline.  i don't
know how one might access festival at that level.


 > This is some food for thought:
 > Sridhar
 > Sridhar Dhanapalan
 > Technical Manager
 > One Laptop per Child Australia
 > M: +61 425 239 701
 > E: sridhar at
 > A: G.P.O. Box 731
 >      Sydney, NSW 2001
 > W:
 > _______________________________________________
 > Devel mailing list
 > Devel at

 paul fox, pgf at

More information about the Devel mailing list