C'est ce que je fais actuellement côté Microsoft mais il faut 300 à 2000 sample de 5-15s de très bonne qualité avec transcript
So this is one of those patent applications that are equally creepy and fascinating.
Facebook is thinking about generating voice models for users that are designed to resemble their actual voice.
In the filing, they describe how voice models could be useful in the context of messaging. For example, if you’re driving in a car and I send you a text message, Facebook could use my voice model to read out the text message using my voice.
Hearing the message in the sender’s digital voice would help the recipient figure out who it’s from without diverting their attention away from their current activity – e.g. driving.
This could also be particularly useful in group chats where there are a number of participants. Using personalised voice models to read out the messages removes the need for a system-generated pre-amble explaining who sent each message.
To generate the voice models, Facebook would ask users to provide audio samples of them reading out certain phrases. Neural networks would then be used to train voice models of the users.
Now, I’m pretty unconvinced of the utility of personalised voice models for reading out messages while driving a car. But one context where personalised voice models could become interesting is in helping people achieve digital immortality.
Digital immortality is when someone’s body dies, but they live on as a digital avatar that people can continue to interact with. In theory, we could use GPT-3 to train a text-generation model on everything someone has written online. And then with a personalised voice model, we could turn AI generated text into a voice. Put those together, people could meaningfully interact with avatars of people who have passed away.
I doubt that Facebook are necessarily thinking too much about this idea of digital immortality. But they are thinking about the metaverse, VR and avatars. At some point, all of these things will converge.
via Patent Drop : lire l’article source