aka text-to-speech , TTS
The counterpart to Speech recognition
Early video that created about TTS using Artificial neural networks (NetTalk)
WaveNet: A Generative Model for Raw Audio
"Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of Transfer learning." Reminds me of the idea of contrasting cases
Deep Generative Models for Speech and Images
Generative Model-Based Text-to-Speech Synthesis, novel idea (at the end): train to minimize listener error
https://google.github.io/tacotron/publications/tacotron2/
Baidu's deep voice