Speech synthesis

cosmos 21st December 2017 at 3:22am
Artificial intelligence Human-computer interaction Speech

aka text-to-speech , TTS

The counterpart to Speech recognition

Early video that created about TTS using Artificial neural networks (NetTalk)

WaveNet: A Generative Model for Raw Audio

"Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of Transfer learning." Reminds me of the idea of contrasting cases

Deep Generative Models for Speech and Images

Generative Model-Based Text-to-Speech Synthesis, novel idea (at the end): train to minimize listener error

https://google.github.io/tacotron/publications/tacotron2/


Baidu's deep voice

https://www.wikiwand.com/en/Speech_synthesis