DeepSeek-V3.1-Base - AI Language Models Tool

Overview

DeepSeek-V3.1-Base is an open-source, long-context decoder-only LLM designed for complex conversational, reasoning, and code-generation workloads. The model provides a single hybrid checkpoint that supports both “thinking” and “non-thinking” inference modes by switching the chat template; this allows clients to run succinct responses or multi-step chain-of-thought style reasoning from the same model. DeepSeek-V3.1-Base exposes an expanded 128K token context window and a Mixture-of-Experts (MoE) style deployment where roughly 37B parameters are activated per token inside a 671–685B total-parameter model, trading larger representational capacity for lower per-token compute during inference. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)) The V3.1 family emphasizes agent and tool-usage improvements: post-training optimizations target tool calling, multi-step agent tasks, and strict function-calling workflows. The authors also publish operational recommendations for FP8 (UE8M0) weight/activation format and specific FP32 handling for gating parameter computation, enabling more compact storage and faster inference when supported by compatible runtimes. The model and weights are released under an MIT license and are available for local deployment and integration via DeepSeek’s APIs and Hugging Face artifacts. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base))

Model Statistics

  • Downloads: 13,491
  • Likes: 1006
  • Pipeline: text-generation
  • Parameters: 684.5B

License: mit

Model Details

Architecture and scale: DeepSeek-V3.1-Base is a decoder-only transformer augmented with MoE-style expert routing so the model’s effective per-token compute is delivered by a subset of the total parameters (reported as ~37B activated parameters with a 671B–685B total parameter footprint). The model uses standard self-attention with long-context extensions and Rotary Positional Embeddings (RoPE) to preserve ordering over very long inputs. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)) Training and long-context method: V3.1 is created by a two-phase long-context extension on top of the V3 base checkpoint. DeepSeek reports substantially expanded long-document pretraining (a scaled 32K extension and an extended 128K phase), and the model is trained with UE8M0 FP8 scale data format for weights and activations to support microscaling and efficient storage. Recommended inference/precision notes include loading certain gating bias parameters in FP32 while using FP8 for the majority of weights and activations. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)) Capabilities and intended use: V3.1 is optimized for multi-turn assistants, agent tool-calling, long-document ingestion (research papers, books, and large codebases), and code generation. The model supports a tool-call template (detailed in its model assets) and offers both non-thinking fast replies and a think-mode for chain-of-thought style, multi-step reasoning. Recent DeepSeek API updates also add strict function-calling beta support and Anthropic-compatible API formatting for easier integration. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base))

Key Features

  • 128K token context window for very large documents and codebases.
  • Hybrid inference modes: single checkpoint supports thinking and non-thinking templates.
  • Large MoE-style footprint (≈671–685B total, ~37B activated per token).
  • Post-training improvements for tool calling, agent workflows, and strict function-calling beta.
  • FP8 (UE8M0) weight/activation format with FP32 gating recommendations for efficient deployment.
  • MIT license and openly published model weights, templates, and technical report.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load tokenizer and model (local or from Hugging Face cache)
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3.1")

# Create convenience function to apply the chat template (thinking/non-thinking)
def format_messages(messages, thinking=True):
    # Hugging Face tokenizer for DeepSeek exposes apply_chat_template
    return tokenizer.apply_chat_template(messages, tokenize=False, thinking=thinking, add_generation_prompt=True)

# Example: switch between fast reply and thinking reply
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain SQL joins and give an example."}
]

fast_prompt = format_messages(messages, thinking=False)
think_prompt = format_messages(messages, thinking=True)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Non-thinking (faster) reply
fast_out = generator(fast_prompt, max_new_tokens=256, do_sample=False)[0]["generated_text"]
print("--Fast reply--\n", fast_out)

# Thinking (chain-of-thought style) reply
think_out = generator(think_prompt, max_new_tokens=512, do_sample=False)[0]["generated_text"]
print("--Thinking reply--\n", think_out)

# Note: For tool calls or agent workflows follow the model's ToolCall template described in the model assets.

Benchmarks

MMLU-Redux (Exact Match) — V3.1 Non-Thinking: 91.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)

MMLU-Redux (Exact Match) — V3.1 Thinking: 93.7 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)

LiveCodeBench (Pass@1) — V3.1 Thinking: 74.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)

SWE Verified (Agent mode) — V3.1 Non-Thinking: 66.0 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)

AIME 2024 (Pass@1) — V3.1 Thinking: 93.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool