coqui/XTTS-v2

A text-to-speech (TTS) voice generation model that enables high-quality voice cloning and cross-language speech synthesis using just a 6-second audio clip. It supports 17 languages, offers emotion and style transfer, improved speaker conditioning, and overall stability improvements over its previous version.

Key Information

  • Category: Audio Models
  • Source: Huggingface
  • Tags: text-to-speech
  • Last updated: January 09, 2026

Structured Metrics

No structured metrics captured yet.

Links

Canonical source: https://huggingface.co/coqui/XTTS-v2