Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A couple seconds? Which project does this?



Mission Impossible 3 was only the proof of concept


how many do you think "a couple" is?


Nope. If you actually tried those, you would quickly find out they don't work. It's actually really hard to clone a voice from a few seconds sample.


A few seconds, yeah. I've seen fairly convincing reproductions from 30 seconds of reading text though.


Cloning voice signature or timbre may need a bit more for a good quality. Then there are idiosyncracies in one’s voice. In addition to that, there are tiny verbal tics, expressions, cadence, feel, and some more to be able to say you have properly cloned someone’s voice. The two second sample is like a shallow clone of sorts and is indeed vector space.


The last sentence is hilarious. What do you think "properly" cloned voices are? Not every model is few shot and not every model relies on their training set for paralanguage anymore. Easiest way to try it out is properly the pro voice cloning from elevenlabs.


"Expressions" in particular is about choice of words. A few seconds definitely isn't enough to duplicate that.


The choice of words is usually yours for a tts model.


There are many models, XTTS [1] is a good one.

[1]: https://coqui.ai/blog/tts/open_xtts


I’ve had success with elevenlabs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: