Home › Language Models › DeepSeek-V3.2

DeepSeek-V3.2 - AI Language Models Tool

Overview

DeepSeek-V3.2 is an open-weight, large Mixture-of-Experts (MoE) language model from DeepSeek designed for efficient long-context reasoning and agentic tool use. The release emphasizes low-cost, long-context inference via a new DeepSeek Sparse Attention (DSA) mechanism and a scaled reinforcement-learning post-training pipeline that targets tool coordination and multi-step reasoning. Weights are distributed under an MIT license on Hugging Face, and the project supplies encoding utilities and a revised chat template that explicitly supports “thinking with tools” and tool-calling workflows. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)) ([techcrunch.com](https://techcrunch.com/2025/09/29/deepseek-releases-sparse-attention-model-that-cuts-api-costs-in-half/?utm_source=openai)) Practically, DeepSeek-V3.2 ships multi-precision checkpoints (BF16 / F8_E4M3 / F32), guidance for local inference, and a high-compute “Speciale” variant focused on deep reasoning. The model card reports large community uptake (hundreds of thousands of downloads in a recent month) and evaluation results across math, coding, and agentic benchmarks. DeepSeek also published a technical report and supporting code to help teams run the model on a variety of inference stacks, including non-CUDA ecosystems. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.2))

Model Statistics

Downloads: 248,598
Likes: 1255
Pipeline: text-generation
Parameters: 685.4B

License: mit

Model Details

Architecture and core innovations: DeepSeek-V3.2 is a very large MoE transformer (reported ~685B parameters) that combines expert routing with a new attention strategy called DeepSeek Sparse Attention (DSA). DSA uses a compact “lightning indexer” followed by fine-grained top-k token selection so each query attends to a small, learned set of past tokens rather than the full sequence — lowering attention complexity from O(L²) toward O(k·L) for long inputs. This design makes 128K+ token contexts practical for many workloads. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)) Post‑training and agentic capabilities: After base pre-training, DeepSeek applied a scaled reinforcement-learning post-training regimen and a large agentic task-synthesis pipeline to teach multi-step tool use and internal “thinking” traces. The model card and supporting reports describe a chat template with a dedicated thinking state, a developer role for search agents, and encoding scripts that convert OpenAI‑style message arrays into model inputs for tool-aware inference. A high-compute variant, DeepSeek-V3.2-Speciale, is provided for deeper chain-of-thought reasoning (Speciale does not support tool-calling). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)) Precision, formats, and deployment notes: Official weights are provided in BF16, experimental FP8 (F8_E4M3) and F32 formats. The model card includes recommended sampling defaults, local-run guidance, and links to repository code and encoding utilities. DeepSeek also published deployment notes and community-contributed recipes for multiple inference providers and non‑CUDA stacks (e.g., Huawei Ascend/CANN), reflecting first-day ecosystem support for alternative accelerators. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.2))

Key Features

DeepSeek Sparse Attention (DSA) — learned indexer + token selector for practical 128K contexts.
Mixture‑of‑Experts backbone with ≈685B total params and expert activation per token.
Scaled RL post‑training and large agentic task synthesis for tool-aware, multi-step reasoning.
Chat template with explicit "thinking with tools" state and OpenAI‑compatible message encoders.
Multi-precision weights released (BF16, experimental FP8 F8_E4M3, and F32) under MIT license.

Example Usage

Example (python):

from transformers import AutoTokenizer, pipeline
# DeepSeek provides an encoding helper in the repo (encoding/encoding_dsv32.py) that
# converts OpenAI-style message arrays to the model's string encoding and parses outputs.
# Example uses the published encoder utilities (see model files on Hugging Face).

# 1) Install and import the repo helper (pseudo-step; file lives in model assets on HF)
# from encoding_dsv32 import encode_messages, parse_message_from_completion_text

# 2) Load tokenizer (weights are large; replace with appropriate local/device config)
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")

# 3) Build message sequence in OpenAI-compatible format
messages = [
    {"role": "user", "content": "Summarize the following document."},
    {"role": "assistant", "content": "Sure — please provide the text.", "reasoning_content": "thinking..."}
]

# 4) Encode with the project's helper (example shown in the model card)
# encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# prompt = encode_messages(messages, **encode_config)
# tokens = tokenizer.encode(prompt)

# 5) Run a text-generation pipeline (for small tests only; full model requires heavy infra)
text_gen = pipeline("text-generation", model="deepseek-ai/DeepSeek-V3.2")
# result = text_gen(prompt, max_new_tokens=256, do_sample=True, temperature=1.0, top_p=0.95)

# 6) Parse model textual output back into message objects
# parsed = parse_message_from_completion_text(result[0]['generated_text'])

print("See model card on Hugging Face for encoding/decoding utilities and inference guidance.")