coqui/XTTS-v2
A text-to-speech (TTS) voice generation model that enables high-quality voice cloning and cross-language speech synthesis using just a 6-second audio clip. It supports 17 languages, offers emotion and style transfer, improved speaker conditioning, and overall stability improvements over its previous version.
Key Information
- Category: Audio Models
- Source: Huggingface
- Tags: text-to-speech
- Last updated: January 09, 2026
Structured Metrics
No structured metrics captured yet.
Links
Canonical source: https://huggingface.co/coqui/XTTS-v2