DeepSeek V3.1 - AI Language Models Tool
Overview
DeepSeek V3.1 is an open-source, agent-oriented large language model released by DeepSeek-AI and published on Hugging Face under an MIT license. It is a Mixture-of-Experts (MoE) decoder-only transformer family member that exposes both a base checkpoint (DeepSeek-V3.1-Base) and a post-trained chat/tuned variant. The release emphasizes agent readiness: built-in chat templates for “thinking” and “non-thinking” modes, explicit tool-calling formats, and example code/search agent trajectories to simplify building multi-step tool-using systems (search, code agents, and function-calling workflows). According to the model card, V3.1 was further pre-trained with large-scale long-context extension phases to support a 128K token context window, and the team optimized the model for improved tool use and faster “thinking” mode responses (Hugging Face model card; DeepSeek technical report). Technically, DeepSeek V3.1 targets developers who need long-context reasoning, multi-step agents, or affordable self-hosting. The project publishes both full weights and quantized variants (BF16, F32, and FP8/E4M3 formats) and provides guidance for FP8/UE8M0 usage. Community benchmarks on the model card show strong performance on coding benchmarks and multi-step agent tasks in the authors’ evaluation suite, while independent reporting and community threads also document trade-offs (accuracy and safety audits, regulatory scrutiny, and mixed user feedback). The model remains fully downloadable for on-premise use and is integrated into DeepSeek’s API offering for managed inference.
Model Statistics
- Downloads: 52,635
- Likes: 811
- Pipeline: text-generation
- Parameters: 684.5B
License: mit
Model Details
Architecture and scale: DeepSeek V3.1 is a 671B-parameter Mixture-of-Experts (MoE) decoder-only transformer with roughly 37B activated parameters per token (the MoE routing activates a small subset of experts at inference). The V3 family uses DeepSeekMoE and Multi-head Latent Attention (MLA) techniques and was trained under a multi-token prediction objective (DeepSeek V3 technical report). Pretraining and long-context extension: the V3 technical report and model card state the family was pre-trained on very large corpora (DeepSeek-V3 report cites ~14.8 trillion tokens for V3) and V3.1 adds two long-context extension phases — a 32K phase expanded to ~630B tokens and a 128K phase expanded to ~209B tokens — enabling a 128K token context window for both base and chat variants (Hugging Face model card; arXiv technical report). Precision, deployment, and tooling: weights are published in BF16 and multiple quantized formats (F8_E4M3 / UE8M0-compatible FP8 formats), and the maintainers document FP8 handling recommendations (e.g., keep some gate/MLP parameters in FP32). The model card includes templated chat formats (non-thinking vs thinking tokens), a strict tool-call schema for function/agent integration, and example code-agent/search-agent trajectories. Training and evaluation: the V3 family reports internal benchmark comparisons (MMLU variants, code benchmarks, SWE-bench, LiveCodeBench, math contest pass@1 scores) showing improved agent and code performance versus earlier DeepSeek checkpoints. The model is released under MIT license and available via Hugging Face and ModelScope; DeepSeek also offers managed API endpoints (chat / reasoner) for hosted access.
Key Features
- 128K token context window for single-call long-document reasoning and retrieval.
- Hybrid 'thinking' and 'non-thinking' chat templates to toggle chain-of-thought behavior.
- Mixture-of-Experts (MoE) 671B total / ~37B activated parameters for cost-effective inference.
- Built-in tool-call schema and example code/search agent templates for multi-step agents.
- Published weights in BF16/F32 and FP8 (UE8M0) formats with quantization guidance.
- Open-source MIT license; weights downloadable from Hugging Face and ModelScope.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Example: load tokenizer and model (requires sufficient hardware or quantized weights)
MODEL = "deepseek-ai/DeepSeek-V3.1"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype=torch.bfloat16, device_map='auto')
# Build chat messages and apply DeepSeek chat template (thinking=True for chain-of-thought)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize key points from the pasted document and list follow-ups."},
]
# The tokenizer provides an apply_chat_template helper on the model card
prompt = tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')
# Generate (small example; adjust generation parameters for real workloads)
out = model.generate(**{k: v.to(model.device) for k, v in inputs.items()}, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
# Note: This example assumes local hardware with enough memory or quantized weights. See the model card for FP8/UE8M0 guidance and agent/tool-call formats. Pricing
DeepSeek publishes both downloadable open-source weights and managed API endpoints. DeepSeek’s official API docs list metered pay-as-you-go rates (separate input/output token billing) for their managed endpoints (deepseek-chat / deepseek-reasoner) and show low per‑1M‑token rates compared with many competitors; the API docs give explicit per‑million-token prices and cache-hit / cache-miss distinctions (see DeepSeek API docs). DeepSeek has also offered off‑peak discounts in the past and published pricing pages for models and modes. (Sources: DeepSeek API documentation; DeepSeek Hugging Face model card; industry reporting on off‑peak pricing.)
Benchmarks
MMLU-Redux (Exact Match) — DeepSeek V3.1 (Non-Thinking): 91.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
LiveCodeBench (Pass@1) — DeepSeek V3.1 (Non-Thinking): 56.4% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
LiveCodeBench (Pass@1) — DeepSeek V3.1 (Thinking): 74.8% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
AIME 2024 (Pass@1): 66.3% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
SWE Verified (Agent mode): 66.0% (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
Key Information
- Category: Language Models
- Type: AI Language Models Tool