Hey guys, just wondering if anyone has had success finetuning StyleTTS2 yet?
The only one I can find is the LJSpeech model, which sounds really good! But wondering what some other narrators / speakers would sound like, especially voices more outside the training dataset.
(Seems zero shot prompting at runtime gives low quality, so need real finetunes!)
Well I played with the demo. https://huggingface.co/spaces/styletts2/styletts2
I dunno if training it on a specific voice is worth it or if RVC will do the job. Compared to XTTS the output is much more natural but the pitch of the cloning is wrong.