DeepSeek-V3.1-Terminus - AI Language Models Tool
Overview
DeepSeek-V3.1-Terminus is a targeted stability-and-agent-improvement release in the DeepSeek V3.1 family. It preserves the hybrid "thinking / non-thinking" capability of V3.1 while addressing user-reported multilingual artifacts (sporadic Chinese/English mixing and abnormal characters) and improving the model's agentic tool use, especially for the Code Agent and Search Agent. The release is distributed as open weights (MIT license) with multi-precision checkpoints (F32 / BF16 / FP8-style formats) and an updated inference demo for community use. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)) DeepSeek-V3.1-Terminus is intended for large-scale research and production agent workflows — e.g., structured tool calling, long-context question answering, and code-generation assistants — and can be served via standard inference stacks such as Hugging Face Transformers or Text-Generation-Inference (TGI). The model card and vendor integrations also report active community uptake (downloads/likes) and provide explicit notes about a known FP8-scale parameter formatting issue that will be corrected in future checkpoints. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus))
Model Statistics
- Downloads: 22,026
- Likes: 358
- Pipeline: text-generation
- Parameters: 684.5B
License: mit
Model Details
Architecture: Terminus keeps the same Transformer / DeepSeek-V3 family architecture (Mixture-of-Experts variant) used across the V3 line; DeepSeek describes it as a hybrid reasoning model that supports both thinking and non-thinking modes via chat templates. Total parameters are reported at about 685B (model card shows 685B params), with long-context support and active-parameter configurations used for inference. Context length and long-context training: the V3.1 lineage was extended to support very long contexts (128K tokens) via a two-phase long-context extension; Terminus inherits that capability. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)) Precision & weights: Terminus is offered in multiple tensor formats — F32, BF16 and FP8-style/safetensor formats (noting an explicit model-card warning that some checkpoint parameters presently do not conform to the UE8M0 FP8 scaling format and will be corrected in a future release). The model card also points developers to an updated inference demo and chat templates for agent/tool workflows. For production serving, vendor providers and inference stacks (Hugging Face/TGI, DeepInfra, NVIDIA integrations) already list the model and offer endpoints or deployment guidance. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus))
Key Features
- Targeted fixes for mixed Chinese/English artifacts and abnormal characters.
- Improved Code Agent and Search Agent reliability for agentic workflows.
- Multi-precision weights available: F32, BF16 and FP8-style safetensors.
- Long-context support inherited from V3.1 (up to ~128K tokens).
- Open MIT license and downloadable checkpoints on Hugging Face.
- Updated inference demo and chat templates for thinking/non-thinking modes.
- Works with Transformers, TGI, vLLM and various provider deployments.
Example Usage
Example (python):
### Example: call DeepSeek-V3.1-Terminus via a hosted inference endpoint (Text-Generation-Inference / Hugging Face-style API)
# This snippet uses huggingface_hub's InferenceClient as an example for TGI-like endpoints.
from huggingface_hub import InferenceClient
# Replace with your endpoint URL (self-hosted TGI or provider endpoint)
endpoint_url = "https://your-tgi-endpoint.example.com"
client = InferenceClient(endpoint_url)
prompt = "Write a short Python function that merges two sorted lists into one sorted list."
# Choose parameters appropriate for your deployment
resp = client.text_generation(
model="deepseek-ai/DeepSeek-V3.1-Terminus",
inputs=prompt,
max_new_tokens=256,
temperature=0.0,
)
print(resp)
# Notes:
# - For local inference you may use Transformers or vendor runtimes (TGI/vLLM) and the model's provided 'inference' demo.
# - Large variants typically require specialized serving (TGI, vLLM, or other accelerated inference stacks) rather than loading into CPU/GPU directly.
# See the model card and vendor docs for recommended server/container parameters. Pricing
DeepSeek API pricing (model-tier pricing) is documented publicly by DeepSeek. Example published API rates (per 1M tokens) include approximate values: cached input $0.028 / 1M, input (cache miss) $0.28 / 1M, output $0.42 / 1M. Prices can vary by provider, endpoint, and time; consult DeepSeek's official pricing page and your provider for the current rates. ([api-docs.deepseek.com](https://api-docs.deepseek.com/quick_start/pricing/?utm_source=openai))
Benchmarks
MMLU-Pro (Reasoning, no tools): 85.0 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
GPQA-Diamond: 80.7 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
Humanity's Last Exam: 21.7 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
SimpleQA (Agentic tool use): 96.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
BrowseComp (Agent browsing): 38.5 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
Terminal-bench (Agent terminal tasks): 36.7 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
LiveCodeBench: 74.9 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
Codeforces (rating): 2046 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)
Key Information
- Category: Language Models
- Type: AI Language Models Tool