SpeechBrain - AI Audio Models Tool

Overview

SpeechBrain is an open-source, all-in-one speech toolkit built on PyTorch that provides end-to-end recipes, pretrained models, and building blocks for a wide range of speech tasks. It supports automatic speech recognition (ASR), text-to-speech (TTS), speaker recognition, speaker diarization, speech enhancement and separation, language identification, and other speech classification tasks. SpeechBrain emphasizes reproducible research through recipe-driven training scripts for common benchmarks (e.g., LibriSpeech, VoxCeleb, WSJ, CHiME) and ships many pretrained models hosted on the Hugging Face Hub. Designed for both research and production prototyping, SpeechBrain exposes high-level pretrained interfaces (from_hparams) alongside lower-level modular components to customize model architectures and training loops. The project is actively developed and maintained as an open-source community effort (source code and model entries are available via the SpeechBrain GitHub repository and the SpeechBrain organization on Hugging Face). The toolkit is licensed for broad use and integrates with common PyTorch workflows and datasets, making it straightforward to fine-tune, evaluate, and export speech models.

Key Features

  • Pretrained ASR models and end-to-end recipes for LibriSpeech and other benchmarks
  • Text-to-speech support including Tacotron2 and neural vocoder workflows
  • Speaker recognition and diarization recipes with pretrained ECAPA and x-vector models
  • Speech enhancement and source separation pipelines for noisy and multi-speaker audio
  • Modular, recipe-driven training with easy Hugging Face Hub integration

Example Usage

Example (python):

from speechbrain.pretrained import EncoderDecoderASR, SpeakerRecognition

# Automatic Speech Recognition (ASR) example
asr_model = EncoderDecoderASR.from_hparams(
    source="speechbrain/asr-transformer-transformerlm-librispeech", savedir="pretrained_models/asr"
)
transcription = asr_model.transcribe_file("examples/audio.wav")
print("ASR transcription:", transcription)

# Speaker recognition (verification) example
verifier = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec"
)
score, prediction = verifier.verify_files("enroll.wav", "test.wav")
print("Speaker verification score:", score, "-> same speaker?", prediction)

# Basic TTS example (if TTS model available)
# from speechbrain.pretrained import HIFIGAN, Tacotron2TTS
# tts_model = Tacotron2TTS.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="pretrained_models/tts")
# vocoder = HIFIGAN.from_hparams(source="speechbrain/hifigan-ljspeech", savedir="pretrained_models/vocoder")
# mel_output = tts_model.encode_text("Hello world")
# vocoder.save_wav(mel_output, "tts_output.wav")
Last Refreshed: 2026-01-09

Key Information

  • Category: Audio Models
  • Type: AI Audio Models Tool