ClearerVoice-Studio - AI Audio Tools Tool
Overview
ClearerVoice-Studio is an open-source, AI-driven speech processing toolkit that bundles state-of-the-art pretrained models, training scripts, and evaluation tools for real-world audio workflows. The project focuses on four tightly related tasks—speech enhancement (denoising), speech separation, speech super-resolution (bandwidth extension), and target (reference- or audio‑visual‑conditioned) speaker extraction—so teams can build end-to-end audio pipelines without assembling disparate research code. The repository includes production-ready models such as FRCRN and MossFormer family weights and provides CLI/ Python inference helpers plus demo scripts. ([github.com](https://github.com/modelscope/ClearerVoice-Studio)) It also ships SpeechScore, a bundled speech-quality toolkit (PESQ, STOI, DNSMOS, SI-SDR, etc.) for reproducible evaluation and benchmarking, and supports automatic model downloads from HuggingFace/ModelScope for quick prototyping. Recent additions include a NumPy-in/NumPy-out inference interface (demo_Numpy2Numpy.py), expanded audio-format support, and a PyPI package (clearvoice) to simplify installation. The project positions itself for researchers and engineers who need robust pretrained models and reproducible training/evaluation pipelines for production or research use. ([github.com](https://github.com/modelscope/ClearerVoice-Studio))
GitHub Statistics
- Stars: 3,946
- Forks: 323
- Contributors: 7
- License: Apache-2.0
- Primary Language: Python
- Last Updated: 2025-08-14T08:26:30Z
Repository activity and community indicators show healthy adoption with a compact core team. According to the GitHub project page and aggregated listings, ClearerVoice-Studio has roughly 3.8–3.9k stars and ~310–323 forks, is licensed under Apache‑2.0, and lists a small contributor base (about seven contributors). The README and PyPI metadata indicate active maintenance through 2025 with recent feature commits (examples: pip packaging, NumPy interface, and added super-resolution training scripts). The issues count (dozens open) and active demos on ModelScope/HuggingFace suggest both active usage and user-reported edge cases to address. Overall: strong interest and useful surface area for contributors, but the small number of core contributors suggests most community involvement is via usage/issue reporting rather than large contributor churn. ([github.com](https://github.com/modelscope/ClearerVoice-Studio))
Installation
Install via pip:
pip install clearvoicegit clone https://github.com/modelscope/ClearerVoice-Studio.gitcd ClearerVoice-Studio/clearvoicepip install --editable .sudo apt update && sudo apt install ffmpeg # Ubuntu/Debian (if you need non-wav formats)brew install ffmpeg # macOS (Homebrew)On Windows: download static build from ffmpeg.org and add bin to PATH Key Features
- Pretrained speech enhancement models (FRCRN, MossFormer2) for 16 kHz and 48 kHz inference.
- Speech separation (MossFormer family) for multi‑speaker isolation at 8/16 kHz sample rates.
- Speech super-resolution (bandwidth extension) to upsample ≥16 kHz audio to 48 kHz.
- Audio‑visual target speaker extraction (face/lip conditioning) and audio‑only reference extraction.
- SpeechScore evaluation toolkit: PESQ, DNSMOS, STOI, SI‑SDR and additional non-intrusive metrics.
Community
Adoption and community feedback are strong: the repo has ~3.8–3.9k GitHub stars and ~310 forks, and the authors report production uses of core models (FRCRN ~3M uses, MossFormer ~2.5M uses on ModelScope). The project publishes demos on HuggingFace and ModelScope and maintains a PyPI package for easier installs. Community channels include GitHub Issues, repository PRs, and platform demo comments; some users have reported task-specific failures (for example, user reports about speaker separation not working in some cases). Overall feedback shows active real‑world usage and continued maintenance, with a compact core maintainer team and typical issue-driven community contributions. ([github.com](https://github.com/modelscope/ClearerVoice-Studio))
Key Information
- Category: Audio Tools
- Type: AI Audio Tools Tool