CSM (Conversational Speech Model)

CSM is a conversational speech generation model by SesameAILabs. It generates RVQ audio codes from text and audio inputs using a Llama backbone for language processing and a specialized audio decoder to produce Mimi audio codes, enabling interactive conversational speech synthesis.

Key Information

  • Category: Audio Models
  • Source: Github
  • Tags: Python
  • Last updated: January 09, 2026

Structured Metrics

No structured metrics captured yet.

Links

Canonical source: https://github.com/SesameAILabs/csm