DeepSeek-R1-Distill-Qwen-1.5B - AI Language Models Tool
Overview
DeepSeek-R1-Distill-Qwen-1.5B is an open-source, distilled dense language model released by DeepSeek that packages reasoning, math, and code capabilities from the larger DeepSeek-R1 family into a compact Qwen-based checkpoint. The model is presented as a distilled variant of Qwen2.5-Math-1.5B and is intended for developers and researchers who need higher-quality chain-of-thought reasoning and math/code performance in a smaller footprint while retaining multi-thousand-token context support. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) The DeepSeek-R1 research pipeline emphasizes reinforcement-learning-driven reasoning discovery (DeepSeek-R1-Zero) followed by a cold-start and SFT+RL refinement; distilled models (including this 1.5B variant) are fine-tuned on ~800k samples generated by DeepSeek-R1 to transfer reasoning patterns to smaller dense models. The model card documents recommended inference settings (temperature ~0.6, top-p 0.95) and a max generation length of 32,768 tokens for benchmark runs. The checkpoint is released under the MIT license and is available on Hugging Face for local use and downstream packaging/quantization. ([github.com](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf))
Model Statistics
- Downloads: 932,613
- Likes: 1432
- Pipeline: text-generation
- Parameters: 1.8B
License: mit
Model Details
Architecture and origin: DeepSeek-R1-Distill-Qwen-1.5B is a distilled dense causal LM derived from Qwen2.5-Math-1.5B, fine-tuned with data generated by the larger DeepSeek-R1 models to capture chain-of-thought (CoT) and advanced reasoning patterns. The project describes a multi-stage training pipeline that includes two RL stages for reasoning-discovery and alignment plus two SFT stages used as seeds for performance and stability. ([github.com](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)) Precision, size, and tokenizer: the Hugging Face model card lists model files in safetensors with BF16 tensors and reports the model as a compact distilled checkpoint (model size and packaging vary by uploaded file; the page lists model size metadata and frequently-available quantized variants). DeepSeek recommends no separate system prompt, temperature 0.5–0.7 (0.6 recommended), top-p 0.95, and starting responses with a <think> token to encourage CoT traces. The model supports long-context runs (max generation length used in evaluation: 32,768 tokens) and is distributed under an MIT license. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) Deployment and compatibility: while DeepSeek-R1 (the full-family models) document some differences in runtime support, the Distill models are explicitly packaged to be compatible with common inference engines (vLLM, SGLang, GGUF/llama.cpp-based runtimes and community llamafile packages). Community quantizations and runnable packages (GGUF, llamafile, INT8 quantized builds) are available for low-memory or CPU/edge deployment. Users should follow the model card and community guides for preferred flags (trust_remote_code, tensor-parallel settings) and recommended sampling parameters. ([huggingface.co](https://huggingface.co/RedHatAI/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w8a8?utm_source=openai))
Key Features
- Distilled from Qwen2.5-Math-1.5B using 800k DeepSeek-R1-generated samples.
- Designed for chain-of-thought reasoning in math, logic, and code tasks.
- Supports long-context generation (evaluations used up to 32,768 tokens).
- Distributed under an MIT license, allowing commercial use and derivatives.
- Compatible with vLLM, SGLang, GGUF/llama.cpp packages and community quantizations.
- Recommended inference settings: temperature 0.6, top-p 0.95, CoT prompts encouraged.
Example Usage
Example (python):
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# Replace model_name with the Hugging Face repo name or a local path
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# Recommended sampling params from DeepSeek model card
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
# Create vLLM instance (tensor_parallel_size / gpus depends on your hardware)
llm = LLM(model=model_name, tensor_parallel_size=1, trust_remote_code=True)
prompt = "Please reason step by step to solve: If 3x+5=20, what is x? Provide final answer inside \boxed{}"
prompt_ids = tokenizer.encode(prompt, add_special_tokens=False)
outputs = llm.generate(prompt_token_ids=[prompt_ids], sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
# Notes: DeepSeek recommends enforcing a chat/template style and specific inference flags
# (e.g., <think> prefix for CoT). For lower-memory deployments, use available quantized
# community builds (GGUF / INT8) or the llamafile/packaged distributions.
# (See model card and vLLM examples for deployment details.) Benchmarks
AIME 2024 (pass@1): 28.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
AIME 2024 (cons@64): 52.7% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
MATH-500 (pass@1): 83.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
LiveCodeBench (pass@1): 16.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
Codeforces (rating): 954 (reported benchmark rating) (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
Key Information
- Category: Language Models
- Type: AI Language Models Tool