sandslides's comments

sandslides · 2025-02-12T13:07:06 1739365626

So Ghostbusters II is now reality? :)

sandslides · on March 29, 2024

The model weights have been uploaded to Huggingface : https://huggingface.co/pyp1/VoiceCraft

This seems to be really high quality judging by the demo's. Not had time to try it for myself

Demos : https://jasonppy.github.io/VoiceCraft_web/

sandslides · on Nov 19, 2023

The LibriTTS demo clones unseen speakers from a five second or so clip

eigenvalue · on Nov 19, 2023

Ah ok, thanks. I tried the other demo.

eigenvalue · on Nov 19, 2023

I tried it. Sounds absolutely nothing like my voice or my wife's voice. I used the same sample files as I used 2 days ago on the Eleven Labs website, and they worked flawlessly there. So this is very, very far from being close to "Eleven Labs quality" when it comes to voice cloning.

thot_experiment · on Nov 19, 2023

Ah that's disappointing, have you tried https://git.ecker.tech/mrq/ai-voice-cloning ? I've had decent results with that, but inference is quite slow.

jsjmch · on Nov 19, 2023

ElevenLabs are based on Tortoise-TTS which was already pre-trained on millions of hours of data, but this one was only trained on LibriTTS which was 500 hours at best. If you have seen millions of voices, there are definitely gonna be some of them that sound like you. It is just a matter of training data, but it is very difficult to have someone collect these large amounts of data and train on it.

sandslides · on Nov 19, 2023

The speech generated is the best I've heard from an open source model. The one test I made didn't make an exact clone either but this is still early days. There's likely something not quite right. The cloned voice does speak without any artifacts or other weirdness that most TTS systems suffer from.

lewismenelaws · on Nov 19, 2023

Yep. Tried as well. Tried a little clip of Tony Sopranos and it came out as a british guy.

xTTSv2 does it much better. But the quality on the trained voices are great though.

eigenvalue · on Nov 19, 2023

Yes, same for my voice. Made me sound British and didn't capture anything special about my voice that makes it recognizable.

sandslides · on Nov 19, 2023

Yes, I noticed that. Doesn't seem right does it

sandslides · on Nov 19, 2023

Just tried the collab notebooks. Seems to be very good quality. It also supports voice cloning.

fullstackchris · on Nov 19, 2023

Great stuff, took a look through the README but... what are the minimum hardware requirements to run this? Is this gonna blow up my CPU / harddrive?

sandslides · on Nov 19, 2023

Not sure. The only inference demos are colab notebooks. The models are approx 700mb each so I imagine it will run on modest gpu

bbbruno222 · on Nov 19, 2023

Would it run in a cheap non-GPU server?

dmw_ng · on Nov 19, 2023

Seems to run about "2x realtime" on 2015 4 core i7-6700HQ laptop, that is, 5 seconds to generate 10 seconds of output. Can imagine that being 4x or greater on a real machine

thot_experiment · on Nov 19, 2023

I skimmed the github but didn't see any info on this, how long does it take to finetune to a particular voice?