CSM (Conversational Speech Model)
CSM is a conversational speech generation model by SesameAILabs. It generates RVQ audio codes from text and audio inputs using a Llama backbone for language processing and a specialized audio decoder to produce Mimi audio codes, enabling interactive conversational speech synthesis.
Key Information
- Category: Audio Models
- Source: Github
- Tags: Python
- Last updated: January 09, 2026
Structured Metrics
No structured metrics captured yet.
Links
Canonical source: https://github.com/SesameAILabs/csm