DeepSeek-Coder-V2 - AI Language Models Tool

Overview

DeepSeek-Coder-V2 is an open-source, Mixture-of-Experts (MoE) code-specialized language model family designed for high-quality code generation, long-context program reasoning, and math reasoning. The project ships two main sizes (a 16B “Lite” variant and a 236B MoE variant) and was further pre-trained from DeepSeek-V2 checkpoints with an additional ~6 trillion tokens to boost coding and mathematical reasoning abilities. The series expands language support to hundreds of programming languages and targets tasks from single-function completion to multi-file codebase understanding. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)) Unlike dense-code models, DeepSeek-Coder-V2 uses an MoE design that activates a much smaller set of parameters per token (active parameters reported as 2.4B for the 16B-lite and 21B for the 236B model), enabling a very large total parameter count (236B) while keeping inference compute and memory costs lower than a fully-dense model of the same size. It also supports an extended 128K token context window for maintaining cross-file or long-repo context, making it suitable for tasks such as whole-repo code understanding, long-form code insertion, and multi-file refactors. The model and its evaluation results, code, and usage examples are published on GitHub and Hugging Face (model card and repo contain the full evaluation tables and download artifacts). ([github.com](https://github.com/deepseek-ai/DeepSeek-Coder-V2))

Model Statistics

  • Downloads: 212
  • Likes: 82
  • Pipeline: text-generation
  • Parameters: 235.7B

License: other

Model Details

Architecture and variants: DeepSeek-Coder-V2 is built on the DeepSeekMoE design (a specialized Mixture-of-Experts architecture) and is released in 16B (Lite) and 236B total-parameter variants. The MoE design yields much smaller activated parameter footprints per request (reported active params: ~2.4B for the 16B-lite, ~21B for the 236B model). The project paper describing the DeepSeekMoE principles is available on arXiv. ([github.com](https://github.com/deepseek-ai/DeepSeek-Coder-V2)) Context, precision, and inference requirements: Models support a 128K token context window and are distributed in BF16 tensors on Hugging Face (safetensors). For full BF16 inference of the large variants the model card recommends multi-GPU setups (examples mention 8×80GB GPUs for BF16 inference of the large weights); smaller lite variants and quantized community builds make local testing feasible. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)) Training and data: According to the project materials, DeepSeek-Coder-V2 was further pre-trained with about 6 trillion tokens from a mixed corpus emphasizing source code and math content to improve code and mathematical reasoning. The team reports extended language coverage (338 programming languages supported — a supported_langs list is included in the repo). Tokenizer/chat template and example code for Hugging Face Transformers and vLLM/vLLM integration are provided in the repo and model card. ([github.com](https://github.com/deepseek-ai/DeepSeek-Coder-V2))

Key Features

  • Mixture-of-Experts architecture: large total params with smaller active-parameter footprint per request.
  • 128K token context window for whole-repo, long-file, and multi-file reasoning tasks.
  • Wide programming-language coverage: reported support for 338 programming languages (supported_langs in repo).
  • Enhanced code & math reasoning after +6T token continued pre-training from DeepSeek-V2 checkpoints.
  • Open-source distribution (model weights on Hugging Face; MIT code licensing; model license allows commercial use).

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# example: code completion with the Lite base model
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).cuda()

prompt = "# write a quick sort algorithm in python\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Note: the model card includes chat templates and vLLM examples for chat-style usage. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base))

Pricing

DeepSeek offers pay-as-you-go API access (public API docs list per‑million-token pricing examples). The DeepSeek API documentation shows example rates such as $0.028 per 1M input tokens (cache hit), $0.28 per 1M input tokens (cache miss), and $0.42 per 1M output tokens for certain V3 endpoints; DeepSeek also publishes free web/mobile chat access and permits self-hosting of released weights (self-hosting incurs infrastructure costs). Exact, model-specific API prices, enterprise contracts, and discounts should be confirmed on DeepSeek's official pricing/docs pages (platform.deepseek.com / api-docs.deepseek.com). ([api-docs.deepseek.com](https://api-docs.deepseek.com/quick_start/pricing/?utm_source=openai))

Benchmarks

HumanEval (DeepSeek-Coder-V2-Instruct, 236B): 90.2 (pass@1-style metric reported in project evaluation table) (Source: https://github.com/deepseek-ai/DeepSeek-Coder-V2)

MBPP+ (DeepSeek-Coder-V2-Instruct, 236B): 76.2 (reported MBPP+ score in project evaluation table) (Source: https://github.com/deepseek-ai/DeepSeek-Coder-V2)

GSM8K (mathematical reasoning, DeepSeek-Coder-V2-Instruct, 236B): 94.9 (reported on project evaluation table) (Source: https://github.com/deepseek-ai/DeepSeek-Coder-V2)

LiveCodeBench (coding leaderboard-style metric, 236B): 43.4 (reported in project evaluation table against closed-source baselines) (Source: https://github.com/deepseek-ai/DeepSeek-Coder-V2)

Maximum context window: 128K tokens (supported and evaluated up to 128K) (Source: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)

Last Refreshed: 2026-03-03

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool