More 'human' voice synth (TTS)
pgf at laptop.org
Tue Jun 21 09:43:02 EDT 2011
> I'm wondering if there's anything we can do to make TTS sound more
> 'human'. We'd like to be able to use the XOs to teach English
> literacy, but the espeak voices are very robotic.
> My understanding is that espeak is optimised for low-power devices
> (great for XOs) and clear (if robotic) speech. Would it be feasible to
> switch to something else, like festival?
i've run festival as part of my home automation system for many many
years, including the last 3 or so on an XO-1 (debxo) which acts as my
current HA server.
the first secret is to run it in client/server mode, to avoid the
server startup latency on every enunciation. but even after that, i
think the latency will be too high for your application. i just
tested it: given a moderate english sentence, it took 3 seconds to
produce output. (i hide this on my system by caching utterances --
that's more feasible in a menuing system than when teaching literacy.)
http://dev.laptop.org/~pgf/junk/festival_out.wav (5 seconds on XO-1)
flite is a lower cost version of festival that might be appropriate.
it seems to reduce the conversion time to about half a second.
but the quality suffers as well.
http://dev.laptop.org/~pgf/junk/flite_out.wav (.5 seconds on XO-1)
fyi, current festival server process footprint:
root 999 0.0 9.4 26668 20004 ? Ss Jun06 10:03 /usr/bin/festival --server /usr/local/etc/nosil.scm
i haven't used espeak -- i suspect there are API interfaces that are
far richer than what i'm doing from the shell commandline. i don't
know how one might access festival at that level.
> This is some food for thought:
> Sridhar Dhanapalan
> Technical Manager
> One Laptop per Child Australia
> M: +61 425 239 701
> E: sridhar at laptop.org.au
> A: G.P.O. Box 731
> Sydney, NSW 2001
> W: www.laptop.org.au
> Devel mailing list
> Devel at lists.laptop.org
paul fox, pgf at laptop.org
More information about the Devel