Saturday, November 3, 2012

Text to Speech with correct intonation

Google has an unoffciale text to speech API. It can be accessed by http requests but it is limited to strings with less than 100 characters. In this post we will see how to split a text longer than 100 characters in order to obtain a correct voice intonation with this service. The approach is straighforward, we split the text in sentences with less than 100 characters according to the punctuation. Let's see how:
def parseText(text):
 """ returns a list of sentences with less than 100 caracters """
 toSay = []
 punct = [',',':',';','.','?','!'] # punctuation
 words = text.split(' ')
 sentence = ''
 for w in words:
  if w[len(w)-1] in punct: # encountered a punctuation mark
   if (len(sentence)+len(w)+1 < 100): # is there enough space?
    sentence += ' '+w # add the word
    toSay.append(sentence.strip()) # save the sentence
   else:
    toSay.append(sentence.strip()) # save the sentence
    toSay.append(w.strip()) # save the word as a sentence
   sentence = '' # start another sentence
  else:
   if (len(sentence)+len(w)+1 < 100):   
    sentence += ' '+w # add the word
   else:
    toSay.append(sentence.strip()) # save the sentence
    sentence = w # start a new sentence
 if len(sentence) > 0:
  toSay.append(sentence.strip())
 return toSay
Now, we can obtain the speech with a http request for each setence:
text = 'Think of color, pitch, loudness, heaviness, and hotness. Each is the topic of a branch of physics.'

print text
toSay = parseText(text)

google_translate_url = 'http://translate.google.com/translate_tts'
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)')]

for i,sentence in enumerate(toSay):
 print i,len(sentence), sentence
 response = opener.open(google_translate_url+'?q='+sentence.replace(' ','%20')+'&tl=en')
 ofp = open(str(i)+'speech_google.mp3','wb')
 ofp.write(response.read())
 ofp.close()
 os.system('cvlc --play-and-exit -q '+str(i)+'speech_google.mp3')
The API returns the speech using the mp3 format. The code above saves the result of the query and plays it using vlc.

10 comments:

  1. Any way to do this without saving an intermediate mp3 file? For instance somehow streaming the audio directly?

    Thanks for posting this,
    Brian

    ReplyDelete
  2. Hello Brian,

    You could use the library GStreamer to open the stream or a program that is able to open the stream directly. I'm sure that vlc and mplayer have this feature.

    ReplyDelete
  3. This is realy nice.
    I had some problems with vlc skipping the first and last bits of a file.
    I modified the code to this: (sorry, I don't know how to format code in blogger comments)

    def tts(text, language='en'):
    print text
    if len(text) >= 100:
    toSay = parseText(text)
    else:
    toSay = [text]

    google_translate_url = 'http://translate.google.com/translate_tts'
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)')]

    files = []

    for i,sentence in enumerate(toSay):
    print i,len(sentence), sentence
    response = opener.open(google_translate_url+'?q='+sentence.replace(' ','%20')+'&tl={0}'.format(language))

    filename = str(i)+'speech_google.mp3'
    ofp = open(filename,'wb')
    ofp.write(response.read())
    ofp.close()
    files += [filename]

    filenames = " ".join(files)
    os.system('mpg123 -q '+filenames)

    if __name__ == "__main__":
    tts(*sys.argv[1:])

    It 'saves up' all files and plays them in a single sequence using mpg123 (I'm on Ubuntu)

    ReplyDelete
    Replies
    1. I'm also using mpg123 (with the original file). Everything is working great, except that every mpg file begins and ends with a loud click. I played a couple of the files in iTunes, and there's no click, so it doesn't seem to be in the file itself.

      The problem is that I have no idea which part of the code is producing the clicks, or if it's even possible to remove them. If anyone has any suggestions, that would be great

      Delete
  4. Cool idea, the Google Translate TTS seems to produce much more natural speech than other free online services. I made a small GUI program from this: https://github.com/suurjaak/TextSpeak.

    ReplyDelete
  5. That really cool Erki.. I did the same thing but used hardware .. The advantage of the hardware is it's legal to use off the device.

    http://www.textspeak.com/store_o.htm

    GJob

    ReplyDelete
  6. Wow! thanks for this information you posted it really helped me a lot. Either way I heard about some online text to speech apps that they also Converts your website to audio..

    ReplyDelete
  7. This is a relatively recent software program and one that is new to me. I have been experimenting with it in a number of contexts over the past few weeks and it is impressive.

    tts voice

    ReplyDelete
  8. http://free-android-apk-games.blogspot.com/2012/09/classic-text-to-speech-engine-v3-1-4-i.html

    ReplyDelete
  9. Thanks a lot!! I had some problems with vlc skipping the first and last bits of a file.

    I use mplayer to play the file flawlessly. Hope anyone find it useful.

    os.system('mplayer '+str(i)+'speech_google.mp3')

    ReplyDelete

Note: Only a member of this blog may post a comment.