DeepSeek-R1-Distill-Qwen-14B - AI Language Models Tool

Overview

DeepSeek-R1-Distill-Qwen-14B is an open-source, distilled 14–15B-parameter dense language model released by DeepSeek that transfers reasoning behaviors discovered by the larger DeepSeek-R1 family into a Qwen2.5-14B base. The distillation pipeline uses outputs generated by DeepSeek-R1 (including long chain-of-thought traces) to fine-tune Qwen2.5 variants, producing a model optimized for multi-step reasoning, math and code tasks while remaining easy to run locally or on inference servers. The model card and paper describe a reinforcement-learning–first research approach: DeepSeek applied large-scale RL to produce strong chain-of-thought behaviours, then distilled those behaviours into smaller dense checkpoints—one of which is this Qwen-14B distill. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Practically, the distilled checkpoint offers a long context config (32,768 tokens used for generation in evaluations), a permissive MIT license for commercial use, and pre-tuned inference recommendations (e.g., temperature ≈0.6, top-p ≈0.95). Evaluation tables published by DeepSeek report strong results on math and coding benchmarks for the 14B distilled checkpoint (AIME, MATH-500, LiveCodeBench and Codeforces-derived metrics), making it a useful model for research, on-premise deployment, and cost-conscious production use where open weights are required. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))

Model Statistics

  • Downloads: 210,387
  • Likes: 597
  • Pipeline: text-generation
  • Parameters: 14.8B

License: mit

Model Details

Architecture and lineage: DeepSeek-R1-Distill-Qwen-14B is a distilled, instruction-style causal transformer built from the Qwen2.5-14B family and fine-tuned with data generated by DeepSeek-R1. The Qwen2.5-14B base is reported at ~14.7B parameters; the distilled checkpoint is listed as a ~15B model on its Hugging Face model card. ([huggingface.co](https://huggingface.co/Qwen/Qwen2.5-14B?utm_source=openai)) Context window & tokenization: evaluations and usage guidance from DeepSeek set generation and maximum prompt lengths to 32,768 tokens for distilled models; DeepSeek also provides configuration and tokenizer notes for running distilled checkpoints in Qwen/Llama-compatible runtimes. Recommended inference settings used in the model card: temperature 0.6, top-p 0.95, and sampling setups used to estimate pass@1. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Training and capabilities: the parent DeepSeek-R1 research applied multi-stage reinforcement learning (including an RL-only R1-Zero variant and later SFT+RL stages) to incentivize chain-of-thought and self-verification behaviors, then distilled 800k+ curated RL-generated samples into smaller dense models (1.5B–70B). The result is a distilled Qwen-14B that preserves many reasoning behaviors (step-by-step solving, reflection/self-checking) while reducing compute cost versus the original R1 MoE models. ([arxiv.org](https://arxiv.org/abs/2501.12948?utm_source=openai)) Runtime & deployment: DeepSeek recommends using vLLM or SGLang for serving the distilled models and provides example commands. The Hugging Face model page includes runnable safetensors weights and notes about BF16 tensors; transformers support may require trust_remote_code or vLLM-based serving. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))

Key Features

  • Distilled from DeepSeek-R1 outputs into a Qwen2.5-14B checkpoint for reasoning gains.
  • Large generation context (configured/evaluated to 32,768 tokens) for long CoT use-cases.
  • Strong math and code benchmark performance versus other dense 14B-class models.
  • Released under an MIT license allowing commercial use and derivative works.
  • Recommended serving with vLLM or SGLang for best compatibility and performance.
  • Inference presets included (temperature ≈0.6, top-p ≈0.95) to reduce repetition.

Example Usage

Example (python):

# Start a vLLM server (shell command)
# vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

# Simple Python client using requests to call a running vLLM OpenAI-compatible endpoint
import requests
import json

API_URL = "http://localhost:8000/v1/completions"  # typical vLLM/OpenAI-compatible route
payload = {
    "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
    "prompt": "Please solve step-by-step: What is the sum of the first ten positive integers?",
    "max_tokens": 200,
    "temperature": 0.6,
    "top_p": 0.95,
    "n": 1
}

resp = requests.post(API_URL, json=payload)
if resp.status_code == 200:
    result = resp.json()
    # vLLM/OpenAI-style responses usually return a 'choices' array
    print(json.dumps(result, indent=2))
else:
    print("Error", resp.status_code, resp.text)

# Notes:
# - DeepSeek recommends using vLLM or SGLang for serving distilled models and provides example commands on the model card.
# - If you run via Hugging Face providers or a hosted endpoint, use their endpoint and auth headers instead of localhost. 

Pricing

Model weights are open-source and distributed under the MIT License, so the checkpoint can be downloaded and self-hosted at no licensing cost. DeepSeek also offers hosted API access with per-token billing; DeepSeek's published API docs and pricing pages (examples) list token-based rates for R1-style endpoints (for example, published developer docs show DeepSeek-R1 input/output pricing tiers such as $0.14 / $0.28 per 1M tokens as an indicative rate), but hosted pricing, quotas, and enterprise terms should be confirmed on DeepSeek's official API/pricing pages. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))

Benchmarks

AIME 2024 (pass@1): 69.7% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

AIME 2024 (cons@64): 80.0% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

MATH-500 (pass@1): 93.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

LiveCodeBench (Pass@1, COT): 53.1% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

Codeforces (rating-equivalent): 1481 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

Model size / context: ~15B params; 32,768-token generation length (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool