DeepSeek-V2-Lite - AI Language Models Tool

Overview

DeepSeek-V2-Lite is an open-weight, Mixture-of-Experts (MoE) language model released by DeepSeek as a research-friendly “lite” variant of DeepSeek‑V2. The model exposes 16B total parameters (15.7B reported in the model card) with a sparse activation budget of ~2.4B parameters per token, and is optimized for cost‑efficient pretraining and inference while preserving strong multilingual performance across English and Chinese benchmarks. DeepSeek‑V2‑Lite was trained from scratch on a large pretraining corpus (5.7T tokens) and shipped with both a base causal model and an SFT chat variant, supporting long contexts (32k) and inference in BF16 on a single 40GB GPU. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite))

Model Statistics

  • Downloads: 186,742
  • Likes: 167
  • Pipeline: text-generation
  • Parameters: 15.7B

License: other

Model Details

Architecture and implementation: DeepSeek‑V2‑Lite uses the DeepSeek‑V2 family’s innovations — Multi‑head Latent Attention (MLA) for compressed KV caching and the DeepSeekMoE feed‑forward design to enable sparse, economical training. The Lite model has 27 layers, a hidden dimension of 2048, 16 attention heads (head dim 128), a KV compression dimension of 512, and per‑head query/key dimensions configured to decouple queries and keys. All FFN layers except the first are MoE: each MoE layer comprises 2 shared experts plus 64 routed experts (expert intermediate dim 1408) with 6 experts activated per token, giving ~15.7B total parameters and ~2.4B activated parameters per token. Pretraining used AdamW with a warmup-and-step-decay schedule, constant large batch sizes, and a max pretraining sequence length of 4K (later extended for long‑context use). For local inference the model card recommends BF16 on a single 40GB GPU and notes vLLM integration for better throughput. For full technical details and training recipes, see the paper and Hugging Face model card. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite))

Key Features

  • Mixture‑of‑Experts design: 16B total params, ~2.4B activated per token (sparse inference).
  • Multi‑head Latent Attention (MLA) to compress KV cache and improve inference throughput.
  • Deployable for BF16 inference on a single 40GB GPU (vLLM recommended for best performance).
  • Long‑context support (32k context length for Lite variants and chat SFT).
  • All-but-first FFN layers use DeepSeekMoE: 64 routed experts, 6 experts activated per token.

Example Usage

Example (python):

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2-Lite"
# Use trust_remote_code=True because the repo provides custom model/tokenizer code
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).cuda()
# load generation defaults from the repo
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

prompt = "Write an efficient implementation of quicksort in C++ and explain its time complexity."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Notes: the official model card recommends using vLLM for improved throughput for production use. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite))

Benchmarks

MMLU (base model): 58.3 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)

C-Eval (Chinese, base model): 60.3 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)

GSM8K (base model): 41.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)

HumanEval (chat SFT): 57.3 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)

MMLU (chat SFT): 55.7 (Source: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)

Last Refreshed: 2026-03-03

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool