Hey guys, just wondering if anyone has had success finetuning StyleTTS2 yet?
The only one I can find is the LJSpeech model, which sounds really good! But wondering what some other narrators / speakers would sound like, especially voices more outside the training dataset.
(Seems zero shot prompting at runtime gives low quality, so need real finetunes!)
You must log in or register to comment.
Well I played with the demo. https://huggingface.co/spaces/styletts2/styletts2
I dunno if training it on a specific voice is worth it or if RVC will do the job. Compared to XTTS the output is much more natural but the pitch of the cloning is wrong.