Home › Language Models › DeepSeek-R1-Distill-Qwen-1.5B

DeepSeek-R1-Distill-Qwen-1.5B - AI Language Models Tool

Overview

DeepSeek-R1-Distill-Qwen-1.5B is a distilled dense language model released by DeepSeek-AI that adapts reasoning behaviors discovered in the DeepSeek-R1 pipeline into a compact Qwen-based checkpoint. The model is derived from Qwen2.5-Math-1.5B and was fine-tuned with data generated by DeepSeek-R1 to improve multi-step reasoning, math and code ability while keeping inference cost low; the model is published under an MIT license and available on Hugging Face. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) Although small (distilled into the ~1.5B family), the checkpoint shows strong performance on math benchmarks (for example: AIME 2024 and MATH-500), while trading off some general instruction-following and broad-domain performance compared to larger models. The authors and community recommend running distilled variants with the DeepSeek usage settings (temperature ≈0.6, top-p ≈0.95) and using vLLM / text-inference servers for reliable production deployment. Downloads and community interest on Hugging Face indicate substantial adoption for research and local deployments. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B))

Model Statistics

Downloads: 1,315,109
Likes: 1447
Pipeline: text-generation

License: mit

Model Details

Technical overview: DeepSeek-R1-Distill-Qwen-1.5B is a dense distilled checkpoint created by fine-tuning a Qwen2.5-Math-1.5B base with outputs and reasoning patterns generated by the DeepSeek-R1 system (the R1 pipeline uses large-scale reinforcement learning to surface chain-of-thought and self-verification behaviors). The distilled family is intended to capture R1’s reasoning strategies in smaller, more deployable weights. The model card and paper (arXiv:2501.12948) and associated GitHub repo document the multi-stage RL + SFT pipeline and list distilled checkpoints across multiple sizes. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) Capabilities & runtime notes: the DeepSeek-R1 series is benchmarked with generation length up to 32,768 tokens and authors report recommended sampling settings (temperature ~0.6, top-p 0.95) to avoid repetition and to encourage coherent multi-step reasoning. Distilled models are intended to be used like other Qwen/Llama-style dense models and are commonly deployed with vLLM, Ollama or similar runtimes; documentation and community examples show single‑GPU (e.g., 1×4090-class) deployments for the 1.5B checkpoint (minimum VRAM recommendations and vLLM CLI examples are included in vendor and community guides). Users should also review the model card usage recommendations (for example: avoid adding a separate system prompt; place instructions in the user prompt). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)) Limitations & safety: community and press reporting on the broader DeepSeek-R1 family has highlighted both strengths (cost-efficient reasoning performance) and concerns (behavioral inconsistencies, refusal patterns and alignment/censorship quirks in some official deployments). Evaluate for safety, bias, and security (especially before using for code-generation in production). ([wired.com](https://www.wired.com/story/deepseek-censorship?utm_source=openai))

Key Features

Distilled from Qwen2.5-Math-1.5B using DeepSeek-R1’s RL-derived reasoning data.
Strong multi-step math reasoning — high MATH-500 and AIME benchmark scores.
Compact 1.5B-class dense checkpoint designed for single‑GPU deployment.
Supports very long generation contexts (authors benchmark up to 32,768 tokens).
MIT license — permissive for commercial use and derivatives (published on Hugging Face).

Example Usage

Example (python):

import requests

# Simple example using the Hugging Face Inference API for the public model
# Replace HF_API_TOKEN with your Hugging Face access token if you need private or higher-rate access.
API_URL = "https://api-inference.huggingface.co/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
HEADERS = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}

payload = {
    "inputs": [
        {"role": "system", "content": "You are a helpful assistant. Please reason step-by-step."},
        {"role": "user", "content": "Solve: If x^2 - 5x + 6 = 0, what are the roots? Show steps."}
    ],
    "parameters": {"max_new_tokens": 256, "temperature": 0.6, "top_p": 0.95}
}

resp = requests.post(API_URL, headers=HEADERS, json=payload, timeout=120)
resp.raise_for_status()
print(resp.json())

# Notes:
# - For local/production deployments, authors recommend running this model via vLLM or other local runtimes
#   (e.g., `vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`) for better throughput and control.