DeepSeek-R1-Distill-Llama-8B - AI Language Models Tool
Overview
DeepSeek-R1-Distill-Llama-8B is an open-source distilled 8B-parameter language model released by DeepSeek-AI as part of the DeepSeek-R1 family. It is a distilled checkpoint built on the Llama-3.1-8B base and optimized to carry over the chain-of-thought (CoT) and reasoning patterns discovered by the larger DeepSeek-R1 models. The model card and accompanying paper describe a training pipeline that emphasizes large-scale reinforcement learning (RL) to elicit reasoning behaviors, followed by selective supervised fine-tuning (SFT) and distillation to produce smaller, inference-friendly checkpoints for math, code, and reasoning tasks. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) Practically, DeepSeek-R1-Distill-Llama-8B is distributed under an MIT license and intended for both research and commercial use where license terms permit. The model supports very long contexts (up to 32,768 tokens in the DeepSeek-R1 family), is packaged in safetensors on Hugging Face, and is recommended to be served with high-throughput backends such as vLLM or SGLang for production inference. Community discussion threads on the Hugging Face repository show active user engagement (issues about tokenizer behavior, inference performance and usage tips). If you need a smaller, reasoning-oriented LLM that’s readily deployable and permissively licensed, this distilled Llama-8B checkpoint is positioned for that use case. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B))
Model Statistics
- Downloads: 542,527
- Likes: 845
- Pipeline: text-generation
License: mit
Model Details
Architecture and lineage: DeepSeek-R1-Distill-Llama-8B is a distilled autogressive transformer derived from Meta’s Llama-3.1-8B base. The public model card states it was fine-tuned using reasoning traces generated by the larger DeepSeek-R1 family (which were produced with a two-stage RL pipeline and SFT seeds). The research claims that reinforcement learning alone (DeepSeek-R1-Zero) can elicit chain-of-thought behaviors, and that those behaviors can be distilled into smaller dense models. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) Context and generation: The DeepSeek-R1 family supports extremely long contexts (32,768 tokens / 32k) and uses generation defaults recommended by the authors (temperature ~0.6, top-p ~0.95 for sampled benchmarks). The distilled Llama-8B inherits the tokenization and model configuration changes required for compatibility with Llama-3.1-derived toolchains. The model is provided as safetensors on Hugging Face and the developers recommend serving it using backends like vLLM or SGLang for best throughput and memory efficiency. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) Capabilities and intended uses: The distilled checkpoint is optimized for step-by-step reasoning, math problem solving and code generation (the model card reports competitive scores on math and code benchmarks for distilled models). The authors recommend prompt practices to trigger CoT (e.g., “Please reason step by step”) and provide usage guidance to avoid degenerate outputs. The model card includes a ‘Usage Recommendations’ section describing known quirks (for example, special “<think>” tokens used in some outputs) and configuration suggestions to reproduce the reported benchmark behavior. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B))
Key Features
- Distilled from DeepSeek-R1 reasoning models to an 8B Llama-3.1 base.
- Optimized for chain-of-thought style reasoning and multi-step math prompts.
- Supports very long contexts (up to 32,768 tokens in the family).
- Packaged as safetensors on Hugging Face with MIT license for broad use.
- Benchmarked on math and code tasks (MATH-500, AIME, LiveCodeBench, Codeforces).
- Recommended for serving with vLLM/SGLang for higher throughput and stability.
- Includes usage guidance to trigger CoT and avoid repetition (temperature ~0.6).
Example Usage
Example (python):
# Minimal example using vLLM to load and generate with the HF model
# (authors recommend vLLM / SGLang for serving; see model card).
# Install vllm: pip install vllm
from vllm import LLM
# Load the model via vLLM (model impl 'transformers' can be used if needed)
llm = LLM(model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B", model_impl="transformers")
prompts = [
"Please reason step by step and give the final answer in a boxed format:\nWhat is the sum of the first five prime numbers?"
]
# Generate (this returns a generator of outputs; API mirrors vLLM docs)
outputs = llm.generate(prompts, sampling_params={"temperature": 0.6, "top_p": 0.95}, max_tokens=128)
for out in outputs:
print(out.text)
# Note: the model card recommends temperature ~0.6 and gives example serve commands:
# vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
# See vLLM documentation for full API and server options. (Sources: model card, vLLM docs).
Benchmarks
AIME 2024 (pass@1) — DeepSeek-R1-Distill-Llama-8B (distilled): 50.4 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
MATH-500 (pass@1) — DeepSeek-R1-Distill-Llama-8B (distilled): 89.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
LiveCodeBench (pass@1) — DeepSeek-R1-Distill-Llama-8B (distilled): 39.6 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
Codeforces (rating-equivalent) — DeepSeek-R1-Distill-Llama-8B (distilled): 1205 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
Model downloads (last month, Hugging Face): 542,527 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
Key Information
- Category: Language Models
- Type: AI Language Models Tool