XTTSv2 is released. I’d say it’s a big jump in quality.

  • Better voice cloning
  • Better audio
  • Impressive prosody and expressiveness
  • Added more languages, I guess total 16 languages.
  • Non-EN languages sounds way better
  • Streaming under 200ms ( I have 3090)
  • Finetuning code

Here you can try https://huggingface.co/spaces/coqui/xtts