DeepSeek-R1-Distill-Llama-8B - AI Language Models Tool
Overview
DeepSeek-R1-Distill-Llama-8B is an 8-billion-parameter distilled language model in DeepSeek’s R1 family, produced by fine-tuning Llama-3.1-8B with reasoning samples generated by the larger DeepSeek-R1. It is designed to provide strong chain-of-thought reasoning, math, and code performance in a compact, self-hostable checkpoint suitable for research and production use. According to the official model card, the distillation pipeline uses DeepSeek-R1 outputs and selective tokenizer/config adjustments to transfer reasoning behaviours into smaller dense checkpoints. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) The model is distributed under an MIT license and is available on the Hugging Face Hub as a safetensors checkpoint (BF16 weights) with a 32k-token context window, making it appropriate for long-form reasoning and document-level tasks when paired with an inference engine that supports large contexts. The project’s public evaluation table reports competitive pass@1 and reasoning-centered results for the distilled Llama-8B variant on benchmarks such as AIME and MATH-500, showing the practical value of the RL-driven R1 pipeline for creating smaller but capable models. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B))
Model Statistics
- Downloads: 574,861
- Likes: 835
- Pipeline: text-generation
- Parameters: 8.0B
License: mit
Model Details
Architecture and provenance: DeepSeek-R1-Distill-Llama-8B is a distilled derivative of Llama-3.1-8B (base) that was fine-tuned using data generated by DeepSeek-R1’s reinforcement-learning-driven pipeline; the model weights are released under the MIT license. The Hugging Face model card lists the checkpoint as an 8B-parameter BF16 safetensors file and explicitly notes that distilled variants were produced by using R1-generated samples and slight tokenizer/config changes. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) Context and inference: The model supports up to 32,768 tokens (32k) for generation and was benchmarked with recommended sampling parameters of temperature ≈0.6 and top-p ≈0.95. The maintainers recommend avoiding an external system prompt and placing instructions in the user prompt for best behaviour; they also document a specific "thinking" marker (<think>) to encourage chain-of-thought outputs when needed. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)) Compatibility and deployment: The model is intended to run similarly to other Llama/Qwen checkpoints and is commonly deployed with inference engines such as vLLM or SGLang (examples and CLI invocation are provided in the model card). Quantized/community builds (GGUF/4-bit) and many finetunes/adapter variants exist in the Hugging Face model tree and Spaces ecosystem. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B))
Key Features
- 8B-parameter distilled Llama-3.1 checkpoint optimized with R1-generated reasoning data.
- 32k-token context window for long-form reasoning and document-level tasks.
- BF16 safetensors weights suitable for high-throughput GPU inference.
- Tuned for chain-of-thought / step-by-step reasoning via RL-derived samples.
- Benchmarked on math and code suites (AIME, MATH-500, LiveCodeBench) with public scores.
- Provided under MIT license and compatible with vLLM, SGLang, and many community quantizations.
Example Usage
Example (python):
from vllm import LLM, SamplingParams
# Example: local inference with vLLM (downloads model from Hugging Face Hub).
# Install: pip install vllm
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
llm = LLM(model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B", dtype="auto", trust_remote_code=True)
prompt = "Please reason step-by-step and solve: If 3x+5=20, what's x?"
outputs = llm.generate([prompt], sampling_params)
for out in outputs:
# vLLM returns a list of RequestOutput objects; text lives at outputs[0].outputs[0].text
print(out.outputs[0].text)
# Note: DeepSeek R1 distill models are often served via vLLM or SGLang; see the model card for
# recommended runtime flags (max_model_len, tensor-parallel, enforce-eager, etc.). Pricing
Model weights are published under the MIT license and are free to download and self-host. DeepSeek also offers a pay-as-you-go API with published token pricing (example: input cache-hit $0.028 / 1M tokens, input cache-miss $0.28 / 1M tokens, output $0.42 / 1M tokens) — see DeepSeek API docs for current rates and enterprise options. Self-hosting infrastructure costs vary by hardware and cloud provider. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B))
Benchmarks
AIME 2024 (pass@1): 50.4 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B (model card evaluation table). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
AIME 2024 (cons@64): 80.0 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B (distilled model evaluation). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
MATH-500 (pass@1): 89.1 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B (distilled model evaluation). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
LiveCodeBench (pass@1-COT): 39.6 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B (distilled model evaluation). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
Codeforces (rating, distilled Llama-8B): 1205 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B (distilled model evaluation). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
Hugging Face - Downloads last month: 574,861 (Source: Model page statistics on Hugging Face. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
Hugging Face - Likes: 835 (Source: Model page metadata on Hugging Face. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)))
Key Information
- Category: Language Models
- Type: AI Language Models Tool