DeepSeek-R1-Distill-Qwen-14B - AI Language Models Tool
Overview
DeepSeek-R1-Distill-Qwen-14B is an open-source, distilled 14–15B-parameter dense language model released by DeepSeek that transfers reasoning behaviors discovered by the larger DeepSeek-R1 family into a Qwen2.5-14B base. The distillation pipeline uses outputs generated by DeepSeek-R1 (including long chain-of-thought traces) to fine-tune Qwen2.5 variants, producing a model optimized for multi-step reasoning, math and code tasks while remaining easy to run locally or on inference servers. The model card and paper describe a reinforcement-learning–first research approach: DeepSeek applied large-scale RL to produce strong chain-of-thought behaviours, then distilled those behaviours into smaller dense checkpoints—one of which is this Qwen-14B distill. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Practically, the distilled checkpoint offers a long context config (32,768 tokens used for generation in evaluations), a permissive MIT license for commercial use, and pre-tuned inference recommendations (e.g., temperature ≈0.6, top-p ≈0.95). Evaluation tables published by DeepSeek report strong results on math and coding benchmarks for the 14B distilled checkpoint (AIME, MATH-500, LiveCodeBench and Codeforces-derived metrics), making it a useful model for research, on-premise deployment, and cost-conscious production use where open weights are required. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))
Model Statistics
- Downloads: 210,387
- Likes: 597
- Pipeline: text-generation
- Parameters: 14.8B
License: mit
Model Details
Architecture and lineage: DeepSeek-R1-Distill-Qwen-14B is a distilled, instruction-style causal transformer built from the Qwen2.5-14B family and fine-tuned with data generated by DeepSeek-R1. The Qwen2.5-14B base is reported at ~14.7B parameters; the distilled checkpoint is listed as a ~15B model on its Hugging Face model card. ([huggingface.co](https://huggingface.co/Qwen/Qwen2.5-14B?utm_source=openai)) Context window & tokenization: evaluations and usage guidance from DeepSeek set generation and maximum prompt lengths to 32,768 tokens for distilled models; DeepSeek also provides configuration and tokenizer notes for running distilled checkpoints in Qwen/Llama-compatible runtimes. Recommended inference settings used in the model card: temperature 0.6, top-p 0.95, and sampling setups used to estimate pass@1. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Training and capabilities: the parent DeepSeek-R1 research applied multi-stage reinforcement learning (including an RL-only R1-Zero variant and later SFT+RL stages) to incentivize chain-of-thought and self-verification behaviors, then distilled 800k+ curated RL-generated samples into smaller dense models (1.5B–70B). The result is a distilled Qwen-14B that preserves many reasoning behaviors (step-by-step solving, reflection/self-checking) while reducing compute cost versus the original R1 MoE models. ([arxiv.org](https://arxiv.org/abs/2501.12948?utm_source=openai)) Runtime & deployment: DeepSeek recommends using vLLM or SGLang for serving the distilled models and provides example commands. The Hugging Face model page includes runnable safetensors weights and notes about BF16 tensors; transformers support may require trust_remote_code or vLLM-based serving. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))
Key Features
- Distilled from DeepSeek-R1 outputs into a Qwen2.5-14B checkpoint for reasoning gains.
- Large generation context (configured/evaluated to 32,768 tokens) for long CoT use-cases.
- Strong math and code benchmark performance versus other dense 14B-class models.
- Released under an MIT license allowing commercial use and derivative works.
- Recommended serving with vLLM or SGLang for best compatibility and performance.
- Inference presets included (temperature ≈0.6, top-p ≈0.95) to reduce repetition.
Example Usage
Example (python):
# Start a vLLM server (shell command)
# vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
# Simple Python client using requests to call a running vLLM OpenAI-compatible endpoint
import requests
import json
API_URL = "http://localhost:8000/v1/completions" # typical vLLM/OpenAI-compatible route
payload = {
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
"prompt": "Please solve step-by-step: What is the sum of the first ten positive integers?",
"max_tokens": 200,
"temperature": 0.6,
"top_p": 0.95,
"n": 1
}
resp = requests.post(API_URL, json=payload)
if resp.status_code == 200:
result = resp.json()
# vLLM/OpenAI-style responses usually return a 'choices' array
print(json.dumps(result, indent=2))
else:
print("Error", resp.status_code, resp.text)
# Notes:
# - DeepSeek recommends using vLLM or SGLang for serving distilled models and provides example commands on the model card.
# - If you run via Hugging Face providers or a hosted endpoint, use their endpoint and auth headers instead of localhost. Pricing
Model weights are open-source and distributed under the MIT License, so the checkpoint can be downloaded and self-hosted at no licensing cost. DeepSeek also offers hosted API access with per-token billing; DeepSeek's published API docs and pricing pages (examples) list token-based rates for R1-style endpoints (for example, published developer docs show DeepSeek-R1 input/output pricing tiers such as $0.14 / $0.28 per 1M tokens as an indicative rate), but hosted pricing, quotas, and enterprise terms should be confirmed on DeepSeek's official API/pricing pages. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))
Benchmarks
AIME 2024 (pass@1): 69.7% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
AIME 2024 (cons@64): 80.0% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
MATH-500 (pass@1): 93.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
LiveCodeBench (Pass@1, COT): 53.1% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Codeforces (rating-equivalent): 1481 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Model size / context: ~15B params; 32,768-token generation length (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Key Information
- Category: Language Models
- Type: AI Language Models Tool