Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Jean-Philippe Encausse

Text-to-speech model can preserve speaker's emotional tone and acoustic environment.

Text-to-speech model can preserve speaker's emotional tone and acoustic environment.