DeepSeek-V3.1-Base - AI Language Models Tool

Overview

DeepSeek-V3.1-Base is an open-source, long-context Mixture-of-Experts (MoE) language model designed for complex conversational, reasoning, and code-generation workflows. The base checkpoint exposes a 128K-token context window and a MoE capacity reported as 671B total parameters with roughly 37B activated per token, enabling the model to operate with high capacity while keeping per-token compute more affordable. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)) V3.1 emphasizes a hybrid "thinking vs non-thinking" workflow: a single model can be driven into either fast direct-answer (non-thinking) or explicit stepwise-reasoning (thinking) behavior by changing the chat template. The release also focuses on smarter tool-calling and agent workflows (structured ToolCall formats and code/search agent templates), plus FP8 micro-scaling (UE8M0) optimizations for weights and activations to improve throughput and memory efficiency. The model and weights are MIT-licensed and available on Hugging Face for download. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1))

Model Statistics

  • Downloads: 19,473
  • Likes: 1009
  • Pipeline: text-generation

License: mit

Model Details

Architecture and scale: DeepSeek-V3.1-Base is built on the DeepSeek-V3 family — a parameter-efficient MoE design that uses Multi-head Latent Attention (MLA) and a DeepSeekMoE routing scheme to reduce inference cost while keeping large capacity (671B total / 37B activated). The V3 technical report describes the MoE design and training corpus scale. ([arxiv.org](https://arxiv.org/abs/2412.19437?utm_source=openai)) Long-context extension: V3.1 extends context length to 128K via a two-phase long-context pretraining strategy. The team reports increasing the 32K extension phase to ~630B tokens and the 128K phase to ~209B tokens for the V3.1 continuous-pretraining effort, which improves retention across very long inputs. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)) Precision & infra: The model was trained and released in an FP8/UE8M0 data format (weights and activations) to align with microscaling inference kernels (DeepGEMM) and reduce memory/latency overhead. The model card also lists practical recommendations (e.g., certain gating parameters should be computed in FP32). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1))

Key Features

  • Hybrid thinking/non-thinking modes toggled via chat template.
  • 128K-token context window for long documents and codebases.
  • MoE design: 671B total parameters with ~37B activated per token.
  • Trained and packaged in UE8M0 FP8 format for high-throughput inference.
  • Structured ToolCall & agent templates for code, search, and function calls.
  • MIT license and downloadable weights on Hugging Face.

Example Usage

Example (python):

import transformers

# Example from the model card: use the tokenizer helper to build V3.1 chat prompts
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who are you?"},
    {"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
    {"role": "user", "content": "1+1=?"}
]

# Prepare thinking-mode prompt (tokenize=False shows template application)
tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
# => '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|><think>'

# Prepare non-thinking prompt
tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
# => '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|></think>'

# Note: the model uses FP8-formatted weights and has specialized inference kernels. Refer to the model card and DeepSeek repo for recommended runtime/server setup before attempting local FP8 inference. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1))

Pricing

The DeepSeek-V3.1-Base weights are MIT-licensed and available for free download on Hugging Face. For hosted use, DeepSeek publishes usage-based API pricing (example: DeepSeek API docs list per-1M-token reference rates for their 128K models — e.g., 1M input tokens (cache hit) $0.028, 1M input tokens (cache miss) $0.28, 1M output tokens $0.42 — check the official API docs for current, region-specific rates). For precise commercial terms, platform billing, or enterprise agreements, consult DeepSeek's official API docs or sales team. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1))

Benchmarks

Context length: 128K tokens (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

Total parameters / Activated: 671B total / 37B activated (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

MMLU-Redux (EM) — NonThinking: 91.8 (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

LiveCodeBench (Pass@1): 56.4 (V3.1-NonThinking) (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

Terminal-bench (tool/agent capability): 31.3 (V3.1-NonThinking) (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

AIME 2024 (Pass@1): 66.3 (NonThinking) / 93.1 (Thinking) (Source: ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)))

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool