I just recently started playing with Coqui XTTS and I have to say my results have been horrid. I am familiar with 11labs, and have always had great results. My background is originally in audio/video production, so I am very capable of giving it whatever exact formats it needs, however my results so far sound NOTHING like the source material. Very robotic, very distorted. I am assume from all the gushing I have seen regarding this tool that it must be user error. Currently I am just using it as a extension on Oobabooga as that was the easiest way to get it up with a UI. Please let me know any tips and tricks you guys have learned! Thank you!
Current workflow:
Record in Adobe Audition
24bit, sample rate 22050
WAV Format
Use 10 second clips of clean audio, no music, no background noise. I like to record samples from audiobooks. Free samples on Amazon recorded with audacity work well for me.
One thing to note, my install (an implementation for SillyTavern) somehow got corrupted, no idea how. It still worked but sounded way worse. Reinstall fixed that so maybe that’s happening to you too.
Check out PIPER TTS, pretty good results and it’s super fast:
https://github.com/rhasspy/piper
https://www.youtube.com/watch?v=GGvdq3giiTQ&ab_channel=Thorsten-Voice