Deepgram - AI Audio Tools Tool

Overview

Deepgram is a production-grade speech AI platform that provides cloud APIs and SDKs for automatic speech recognition (ASR) and text-to-speech (TTS). It targets applications that require both real-time and batch processing — including call analytics, captions, voice assistants, and voice-enabled applications — by offering configurable models, speaker diarization, timestamps, and tooling for deployment and integration. Deepgram supports multiple languages and provides features for punctuation, profanity filtering, and speaker labeling to help turn audio into structured, searchable transcripts. Designed for integration at scale, Deepgram exposes REST and streaming (WebSocket) endpoints plus official SDKs for common server environments (Node.js, Python, Go). Developers can choose prebuilt models optimized for settings such as phone calls or meetings and extend base capabilities with custom vocabularies and model configuration. The platform also includes neural TTS voices and configurable synthesis options suitable for IVR and voice agents. Authentication is via API key, and typical workflows include low-latency streaming for live transcriptions and asynchronous batch jobs for large archives of recorded audio.

Key Features

  • Real-time streaming ASR with low-latency WebSocket and REST endpoints
  • Batch transcription for large audio archives with asynchronous processing
  • Speaker diarization to label and separate multiple speakers in transcripts
  • Word-level timestamps and confidence scores for fine-grained analysis
  • Multilingual models and automatic language selection support
  • Neural text-to-speech with configurable voices and SSML support
  • Custom vocabulary and model configuration for domain-specific terms
  • Official SDKs for Node.js, Python, and Go plus WebSocket streaming

Code Examples

Python

import requests

API_KEY = "YOUR_DEEPGRAM_API_KEY"
AUDIO_PATH = "audio.wav"

headers = {
    "Authorization": f"Token {API_KEY}",
    "Content-Type": "audio/wav"
}
params = {
    "punctuate": "true",
    "language": "en-US"
}

with open(AUDIO_PATH, "rb") as f:
    resp = requests.post("https://api.deepgram.com/v1/listen", headers=headers, params=params, data=f)

print(resp.status_code)
print(resp.json())

Curl

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
AUDIO_FILE="audio.wav"

curl -X POST "https://api.deepgram.com/v1/listen?language=en-US&punctuate=true" \
  -H "Authorization: Token ${DEEPGRAM_API_KEY}" \
  -H "Content-Type: audio/wav" \
  --data-binary @${AUDIO_FILE}

Javascript

const { Deepgram } = require('@deepgram/sdk');
const fs = require('fs');

const DEEPGRAM_API_KEY = 'YOUR_DEEPGRAM_API_KEY';
const deepgram = new Deepgram(DEEPGRAM_API_KEY);

async function transcribe() {
  const audio = { buffer: fs.readFileSync('audio.wav') };
  const options = { punctuate: true, language: 'en-US' };

  const response = await deepgram.transcription.preRecorded(audio, options);
  console.log(JSON.stringify(response.results, null, 2));
}

transcribe().catch(console.error);

API Overview

  • Authentication: API key
  • Base URL: https://api.deepgram.com/v1
Last Refreshed: 2026-01-09

Key Information

  • Category: Audio Tools
  • Type: AI Audio Tools Tool