I just recently started playing with Coqui XTTS and I have to say my results have been horrid. I am familiar with 11labs, and have always had great results. My background is originally in audio/video production, so I am very capable of giving it whatever exact formats it needs, however my results so far sound NOTHING like the source material. Very robotic, very distorted. I am assume from all the gushing I have seen regarding this tool that it must be user error. Currently I am just using it as a extension on Oobabooga as that was the easiest way to get it up with a UI. Please let me know any tips and tricks you guys have learned! Thank you!
Current workflow:
Record in Adobe Audition
24bit, sample rate 22050
WAV Format
Check out PIPER TTS, pretty good results and it’s super fast:
https://github.com/rhasspy/piper
https://www.youtube.com/watch?v=GGvdq3giiTQ&ab_channel=Thorsten-Voice