DeepSeek V3.1 - AI Language Models Tool

Overview

DeepSeek V3.1 is an open-source, agent-oriented large language model released by DeepSeek-AI and published on Hugging Face under an MIT license. It is a Mixture-of-Experts (MoE) decoder-only transformer family member that exposes both a base checkpoint (DeepSeek-V3.1-Base) and a post-trained chat/tuned variant. The release emphasizes agent readiness: built-in chat templates for “thinking” and “non-thinking” modes, explicit tool-calling formats, and example code/search agent trajectories to simplify building multi-step tool-using systems (search, code agents, and function-calling workflows). According to the model card, V3.1 was further pre-trained with large-scale long-context extension phases to support a 128K token context window, and the team optimized the model for improved tool use and faster “thinking” mode responses (Hugging Face model card; DeepSeek technical report). Technically, DeepSeek V3.1 targets developers who need long-context reasoning, multi-step agents, or affordable self-hosting. The project publishes both full weights and quantized variants (BF16, F32, and FP8/E4M3 formats) and provides guidance for FP8/UE8M0 usage. Community benchmarks on the model card show strong performance on coding benchmarks and multi-step agent tasks in the authors’ evaluation suite, while independent reporting and community threads also document trade-offs (accuracy and safety audits, regulatory scrutiny, and mixed user feedback). The model remains fully downloadable for on-premise use and is integrated into DeepSeek’s API offering for managed inference.

Model Statistics

  • Downloads: 52,635
  • Likes: 811
  • Pipeline: text-generation
  • Parameters: 684.5B

License: mit

Model Details

Architecture and scale: DeepSeek V3.1 is a 671B-parameter Mixture-of-Experts (MoE) decoder-only transformer with roughly 37B activated parameters per token (the MoE routing activates a small subset of experts at inference). The V3 family uses DeepSeekMoE and Multi-head Latent Attention (MLA) techniques and was trained under a multi-token prediction objective (DeepSeek V3 technical report). Pretraining and long-context extension: the V3 technical report and model card state the family was pre-trained on very large corpora (DeepSeek-V3 report cites ~14.8 trillion tokens for V3) and V3.1 adds two long-context extension phases — a 32K phase expanded to ~630B tokens and a 128K phase expanded to ~209B tokens — enabling a 128K token context window for both base and chat variants (Hugging Face model card; arXiv technical report). Precision, deployment, and tooling: weights are published in BF16 and multiple quantized formats (F8_E4M3 / UE8M0-compatible FP8 formats), and the maintainers document FP8 handling recommendations (e.g., keep some gate/MLP parameters in FP32). The model card includes templated chat formats (non-thinking vs thinking tokens), a strict tool-call schema for function/agent integration, and example code-agent/search-agent trajectories. Training and evaluation: the V3 family reports internal benchmark comparisons (MMLU variants, code benchmarks, SWE-bench, LiveCodeBench, math contest pass@1 scores) showing improved agent and code performance versus earlier DeepSeek checkpoints. The model is released under MIT license and available via Hugging Face and ModelScope; DeepSeek also offers managed API endpoints (chat / reasoner) for hosted access.

Key Features

  • 128K token context window for single-call long-document reasoning and retrieval.
  • Hybrid 'thinking' and 'non-thinking' chat templates to toggle chain-of-thought behavior.
  • Mixture-of-Experts (MoE) 671B total / ~37B activated parameters for cost-effective inference.
  • Built-in tool-call schema and example code/search agent templates for multi-step agents.
  • Published weights in BF16/F32 and FP8 (UE8M0) formats with quantization guidance.
  • Open-source MIT license; weights downloadable from Hugging Face and ModelScope.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Example: load tokenizer and model (requires sufficient hardware or quantized weights)
MODEL = "deepseek-ai/DeepSeek-V3.1"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype=torch.bfloat16, device_map='auto')

# Build chat messages and apply DeepSeek chat template (thinking=True for chain-of-thought)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize key points from the pasted document and list follow-ups."},
]
# The tokenizer provides an apply_chat_template helper on the model card
prompt = tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')

# Generate (small example; adjust generation parameters for real workloads)
out = model.generate(**{k: v.to(model.device) for k, v in inputs.items()}, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

# Note: This example assumes local hardware with enough memory or quantized weights. See the model card for FP8/UE8M0 guidance and agent/tool-call formats.

Pricing

DeepSeek publishes both downloadable open-source weights and managed API endpoints. DeepSeek’s official API docs list metered pay-as-you-go rates (separate input/output token billing) for their managed endpoints (deepseek-chat / deepseek-reasoner) and show low per‑1M‑token rates compared with many competitors; the API docs give explicit per‑million-token prices and cache-hit / cache-miss distinctions (see DeepSeek API docs). DeepSeek has also offered off‑peak discounts in the past and published pricing pages for models and modes. (Sources: DeepSeek API documentation; DeepSeek Hugging Face model card; industry reporting on off‑peak pricing.)

Benchmarks

MMLU-Redux (Exact Match) — DeepSeek V3.1 (Non-Thinking): 91.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)

LiveCodeBench (Pass@1) — DeepSeek V3.1 (Non-Thinking): 56.4% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)

LiveCodeBench (Pass@1) — DeepSeek V3.1 (Thinking): 74.8% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)

AIME 2024 (Pass@1): 66.3% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)

SWE Verified (Agent mode): 66.0% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool