DeepSeek-V3.2-Exp - AI Language Models Tool

Overview

DeepSeek-V3.2-Exp is an experimental, open-weight large language model from DeepSeek that introduces DeepSeek Sparse Attention (DSA) to reduce compute and memory for long-context training and inference while preserving output quality comparable to V3.1-Terminus. The release is provided under an MIT license with safetensor weights available in multiple tensor formats (BF16, FP8/E4M3, F32) and is published on Hugging Face with an updated inference demo and conversion tools for local deployment. (Source: Hugging Face model card: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp) Designed as an intermediate step toward DeepSeek's next-generation models, V3.2-Exp keeps training configurations aligned with V3.1 to allow apples-to-apples comparison; public benchmark tables published by DeepSeek show near-parity with V3.1 across reasoning and agentic tool-use tasks. Operationally, V3.2-Exp has day‑0 support on ecosystem runtimes such as SGLang and vLLM and ships example scripts and Docker images to run on H200/MI350/various NPUs. DeepSeek also published open-source GPU kernels (TileLang, DeepGEMM, FlashMLA) and a tech report for researchers. (Sources: Hugging Face model card; DeepSeek API docs; DeepSeek GitHub.)

Model Statistics

  • Downloads: 69,742
  • Likes: 932
  • Pipeline: text-generation
  • Parameters: 685.4B

License: mit

Model Details

Architecture and size: DeepSeek-V3.2-Exp is a ~685B-parameter transformer-class model (checkpoint listed as 685B on the Hugging Face model card) built from the same structural family as V3.1-Terminus, with a new sparse attention component (DeepSeek Sparse Attention, DSA) layered into the attention/kv/indexer pipeline to prune unnecessary token-to-token computation in long windows. (Source: Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp) Sparsity mechanism & training: DSA uses a trainable indexer that learns to approximate dense head-summed attention distributions (indexer warm-up followed by sparse training using KL losses and large-scale token counts, per DeepSeek technical notes and third-party writeups). The implementation is designed for MLA/MQA decoding modes and KV reuse to maximize throughput. GPU/kernel support includes TileLang (research-friendly kernels), DeepGEMM (indexer logits and paged kernels), and FlashMLA (sparse attention kernels). (Sources: DeepSeek GitHub tech report; technical coverage: https://github.com/deepseek-ai/DeepSeek-V3.2-Exp and analysis articles.) Formats & deployment: The authors provide safetensor checkpoints in multiple numeric formats (BF16, FP8/E4M3, F32). Deployments are supported with example conversion and torchrun scripts, SGLang Docker images (H200/MI350/NPUs), and day‑0 vLLM recipes. The model card also records community adoption signals (downloads and likes) and a recent inference-code RoPE fix published on 2025-11-17. (Sources: Hugging Face; SGLang docs; repository updates.)

Key Features

  • DeepSeek Sparse Attention (DSA) — trainable fine-grained sparse attention for long contexts.
  • Open-weight safetensors in BF16, FP8 (E4M3), and F32 formats for flexible precision.
  • Day‑0 support in vLLM and SGLang with Docker images and example launch commands.
  • Conversion and torchrun inference demo for multi-GPU model-parallel local hosting.
  • Open-source performance kernels: TileLang, DeepGEMM, and FlashMLA for research and production.

Example Usage

Example (python):

import transformers

# Example: encode DeepSeek-style chat messages using DeepSeek's encoding helper
# (adapted from DeepSeek V3.2 examples; use encoding_dsv32 utilities from the repo)

from encoding_dsv32 import encode_messages, parse_message_from_completion_text

# load tokenizer from Hugging Face (tokens/encoding compatible with DeepSeek tooling)
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")

messages = [
    {"role": "user", "content": "Summarize the main benefits of sparse attention."},
]

encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
prompt = encode_messages(messages, **encode_config)

# Tokenize and inspect
tokens = tokenizer.encode(prompt)
print("Prompt tokens:", tokens)

# Note: DeepSeek provides inference examples and a conversion script in the `inference/` folder
# for running the model locally (torchrun + generate.py), and also documents SGLang/vLLM recipes.
# See: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp and the model GitHub for full runbook.

Pricing

DeepSeek announced a >50% API price cut for its hosted API at the V3.2-Exp launch; exact commercial pricing tiers are not published in the model card — see DeepSeek API docs for details. (Source: DeepSeek API announcement: https://api-docs.deepseek.com/news/news250929; coverage: Reuters.)

Benchmarks

MMLU-Pro (reasoning mode): 85.0 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)

LiveCodeBench (code generation): 74.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)

AIME 2025: 89.3 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)

Codeforces (rating equivalent): 2121 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)

SimpleQA (agentic tool-use): 97.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool