DeepSeek-V3.1-Terminus - AI Language Models Tool

Overview

DeepSeek-V3.1-Terminus is a post-release refresh of DeepSeek’s V3.1 instruction-tuned family that targets two practical problems reported by downstream users: inconsistent multilingual decoding (mixed Chinese/English artifacts) and occasional instability in agent/tool workflows. The release keeps the same core architecture as DeepSeek‑V3 while shipping updated inference tooling, an updated search-agent template, and weight variants (FP8 / BF16 / FP32) under an MIT license for self-hosting and research. According to the Hugging Face model card, Terminus focuses on language consistency and improved agentic tool use (Code and Search agents) while delivering modest benchmark uplifts on several standard suites. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)) Terminus is intended as a drop‑in refinement for teams already using DeepSeek‑V3.1: it provides an updated inference demo and guidance to run locally, and the maintainers note a single known checkpoint issue (self_attn.o_proj FP8 scale mismatch) that will be fixed in a future patch. The model is distributed with safetensors weights and is usable via Transformers or Text-Generation-Inference (TGI) workflows for text generation and agent pipelines, making it suitable for code generation, multi-step web search agents, and long-context retrieval workflows (V3.1 family supports very long contexts). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus))

Model Statistics

  • Downloads: 8,350
  • Likes: 362
  • Pipeline: text-generation

License: mit

Model Details

Architecture and weights: DeepSeek-V3.1-Terminus retains the same model structure as DeepSeek‑V3 and is published as an instruction-tuned follow-on checkpoint to DeepSeek‑V3.1-Base (base checkpoint referenced from the model tree). The Hugging Face model card lists weight builds in multiple tensor formats (BF16, FP8/E4M3, F32 variants) and distributes safetensors checkpoints under the MIT license for both research and commercial use. The model card also lists the checkpoint as ~685B parameters (the family’s published scale) and tags the model for text-generation and agentic workflows. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)) Capabilities and intended use: Terminus targets reliable multilingual generation (reducing unexpected mixed Chinese/English output), improved programmatic tool calling, and stronger end‑to‑end agent success rates for search and code agents. It is shown to be usable in large-agent setups (browser/search agents, terminal automation, SWE benchmarks) and supports long-context scenarios consistent with the DeepSeek‑V3 family’s long-context extension work (the V3.1 family was extended for much larger context lengths). For local / on-prem use, the model card and accompanying DeepSeek-V3 repository include an inference demo and guidance; maintainers recommend loading certain parameters (e.g., MLP gate corrections) in FP32 when applicable and note a current FP8 format caveat for one projection layer. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus))

Key Features

  • Reduced mixed Chinese/English decoding artifacts for better multilingual consistency.
  • Optimized Code Agent for improved code repair and fewer iterative attempts.
  • Updated Search Agent template and toolset for more reliable multi-step web reasoning.
  • Weights provided in BF16, FP8 (E4M3 / UE8M0‑style) and FP32 formats (safetensors).
  • MIT license enabling research and commercial self-hosting with permissive reuse.
  • Drop‑in architecture compatibility with DeepSeek‑V3 tooling and inference demos.
  • Modest benchmark improvements on MMLU-Pro, BrowseComp and terminal/agent benchmarks.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Note: this example shows a typical Transformers load pattern. The full Terminus
# checkpoint is very large — use appropriate hardware or a TGI / remote inference provider.
MODEL_ID = "deepseek-ai/DeepSeek-V3.1-Terminus"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=False)
# trust_remote_code=True is often required for community models with custom code.
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    use_safetensors=True,
)

prompt = "Write a short, well-documented Python function that computes Fibonacci numbers."
inputs = tokenizer(prompt, return_tensors="pt")
# Move inputs to model device
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generation config — adjust for your environment and needs
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    do_sample=False,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# For production agent workflows, consider running Terminus behind a Text-Generation-Inference
# (TGI) server or a managed provider for streaming, tools, and function-calling integrations.
# See the model card for an updated inference demo and agent templates. (Hugging Face model card)
# Reference: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus

Pricing

DeepSeek operates commercial API endpoints (deepseek-chat / deepseek-reasoner) with published per‑million token pricing; example published rates for the V3 family are approximately: input tokens $0.028 (cache hit) / $0.28 (cache miss) per 1M, and $0.42 per 1M output tokens — consult DeepSeek’s official pricing page for exact, up‑to‑date tiers and model mappings. Note: Terminus is available as MIT‑licensed checkpoints for self‑hosting (no charge for the weights themselves). ([api-docs.deepseek.com](https://api-docs.deepseek.com/quick_start/pricing/?utm_source=openai))

Benchmarks

MMLU-Pro: 85.0 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

GPQA-Diamond: 80.7 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

Humanity's Last Exam: 21.7 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

BrowseComp (agentic web browsing): 38.5 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

SimpleQA (tool-assisted QA): 96.8 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

Terminal-bench (terminal / automation): 36.7 (DeepSeek-V3.1-Terminus) (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool