Home › Audio Models › SpeechBrain

SpeechBrain - AI Audio Models Tool

Overview

SpeechBrain is an open-source, all-in-one speech toolkit built on PyTorch that provides end-to-end recipes, pretrained models, and building blocks for a wide range of speech tasks. It supports automatic speech recognition (ASR), text-to-speech (TTS), speaker recognition, speaker diarization, speech enhancement and separation, language identification, and other speech classification tasks. SpeechBrain emphasizes reproducible research through recipe-driven training scripts for common benchmarks (e.g., LibriSpeech, VoxCeleb, WSJ, CHiME) and ships many pretrained models hosted on the Hugging Face Hub. Designed for both research and production prototyping, SpeechBrain exposes high-level pretrained interfaces (from_hparams) alongside lower-level modular components to customize model architectures and training loops. The project is actively developed and maintained as an open-source community effort (source code and model entries are available via the SpeechBrain GitHub repository and the SpeechBrain organization on Hugging Face). The toolkit is licensed for broad use and integrates with common PyTorch workflows and datasets, making it straightforward to fine-tune, evaluate, and export speech models.

Key Features

Pretrained ASR models and end-to-end recipes for LibriSpeech and other benchmarks
Text-to-speech support including Tacotron2 and neural vocoder workflows
Speaker recognition and diarization recipes with pretrained ECAPA and x-vector models
Speech enhancement and source separation pipelines for noisy and multi-speaker audio
Modular, recipe-driven training with easy Hugging Face Hub integration

Example Usage

Example (python):

from speechbrain.pretrained import EncoderDecoderASR, SpeakerRecognition

# Automatic Speech Recognition (ASR) example
asr_model = EncoderDecoderASR.from_hparams(
    source="speechbrain/asr-transformer-transformerlm-librispeech", savedir="pretrained_models/asr"
)
transcription = asr_model.transcribe_file("examples/audio.wav")
print("ASR transcription:", transcription)

# Speaker recognition (verification) example
verifier = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec"
)
score, prediction = verifier.verify_files("enroll.wav", "test.wav")
print("Speaker verification score:", score, "-> same speaker?", prediction)

# Basic TTS example (if TTS model available)
# from speechbrain.pretrained import HIFIGAN, Tacotron2TTS
# tts_model = Tacotron2TTS.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="pretrained_models/tts")
# vocoder = HIFIGAN.from_hparams(source="speechbrain/hifigan-ljspeech", savedir="pretrained_models/vocoder")
# mel_output = tts_model.encode_text("Hello world")
# vocoder.save_wav(mel_output, "tts_output.wav")

Last Refreshed: 2026-01-09

Key Information

Category: Audio Models
Type: AI Audio Models Tool

Visit Official Website

SpeechBrain - AI Audio Models Tool

Overview

Key Features

Example Usage

Key Information

Related Tools

OpenVoice

Parler-TTS

Whisper Large

openai/whisper-large-v3-turbo

OpenVoice V2

Whisper Large v3