DeepSeek-R1-Distill-Qwen-1.5B - AI Language Models Tool

Overview

DeepSeek-R1-Distill-Qwen-1.5B is an open-source, distilled dense language model released by DeepSeek that packages reasoning, math, and code capabilities from the larger DeepSeek-R1 family into a compact Qwen-based checkpoint. The model is presented as a distilled variant of Qwen2.5-Math-1.5B and is intended for developers and researchers who need higher-quality chain-of-thought reasoning and math/code performance in a smaller footprint while retaining multi-thousand-token context support. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) The DeepSeek-R1 research pipeline emphasizes reinforcement-learning-driven reasoning discovery (DeepSeek-R1-Zero) followed by a cold-start and SFT+RL refinement; distilled models (including this 1.5B variant) are fine-tuned on ~800k samples generated by DeepSeek-R1 to transfer reasoning patterns to smaller dense models. The model card documents recommended inference settings (temperature ~0.6, top-p 0.95) and a max generation length of 32,768 tokens for benchmark runs. The checkpoint is released under the MIT license and is available on Hugging Face for local use and downstream packaging/quantization. ([github.com](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf))

Model Statistics

  • Downloads: 932,613
  • Likes: 1432
  • Pipeline: text-generation
  • Parameters: 1.8B

License: mit

Model Details

Architecture and origin: DeepSeek-R1-Distill-Qwen-1.5B is a distilled dense causal LM derived from Qwen2.5-Math-1.5B, fine-tuned with data generated by the larger DeepSeek-R1 models to capture chain-of-thought (CoT) and advanced reasoning patterns. The project describes a multi-stage training pipeline that includes two RL stages for reasoning-discovery and alignment plus two SFT stages used as seeds for performance and stability. ([github.com](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)) Precision, size, and tokenizer: the Hugging Face model card lists model files in safetensors with BF16 tensors and reports the model as a compact distilled checkpoint (model size and packaging vary by uploaded file; the page lists model size metadata and frequently-available quantized variants). DeepSeek recommends no separate system prompt, temperature 0.5–0.7 (0.6 recommended), top-p 0.95, and starting responses with a <think> token to encourage CoT traces. The model supports long-context runs (max generation length used in evaluation: 32,768 tokens) and is distributed under an MIT license. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) Deployment and compatibility: while DeepSeek-R1 (the full-family models) document some differences in runtime support, the Distill models are explicitly packaged to be compatible with common inference engines (vLLM, SGLang, GGUF/llama.cpp-based runtimes and community llamafile packages). Community quantizations and runnable packages (GGUF, llamafile, INT8 quantized builds) are available for low-memory or CPU/edge deployment. Users should follow the model card and community guides for preferred flags (trust_remote_code, tensor-parallel settings) and recommended sampling parameters. ([huggingface.co](https://huggingface.co/RedHatAI/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w8a8?utm_source=openai))

Key Features

  • Distilled from Qwen2.5-Math-1.5B using 800k DeepSeek-R1-generated samples.
  • Designed for chain-of-thought reasoning in math, logic, and code tasks.
  • Supports long-context generation (evaluations used up to 32,768 tokens).
  • Distributed under an MIT license, allowing commercial use and derivatives.
  • Compatible with vLLM, SGLang, GGUF/llama.cpp packages and community quantizations.
  • Recommended inference settings: temperature 0.6, top-p 0.95, CoT prompts encouraged.

Example Usage

Example (python):

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# Replace model_name with the Hugging Face repo name or a local path
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# Recommended sampling params from DeepSeek model card
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)

# Create vLLM instance (tensor_parallel_size / gpus depends on your hardware)
llm = LLM(model=model_name, tensor_parallel_size=1, trust_remote_code=True)

prompt = "Please reason step by step to solve: If 3x+5=20, what is x? Provide final answer inside \boxed{}"
prompt_ids = tokenizer.encode(prompt, add_special_tokens=False)

outputs = llm.generate(prompt_token_ids=[prompt_ids], sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

# Notes: DeepSeek recommends enforcing a chat/template style and specific inference flags
# (e.g., <think> prefix for CoT). For lower-memory deployments, use available quantized
# community builds (GGUF / INT8) or the llamafile/packaged distributions.
# (See model card and vLLM examples for deployment details.)

Benchmarks

AIME 2024 (pass@1): 28.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

AIME 2024 (cons@64): 52.7% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

MATH-500 (pass@1): 83.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

LiveCodeBench (pass@1): 16.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

Codeforces (rating): 954 (reported benchmark rating) (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool