DeepSeek-R1 Distill Qwen 14B GGUF - AI Language Models Tool

Overview

DeepSeek-R1 Distill Qwen 14B GGUF is a community-distributed, quantized variant of DeepSeek's R1 distilled checkpoint converted to the GGUF format for fast local inference with llama.cpp toolchains. The package published under the lmstudio-community account bundles multiple GGUF quant variants (Q3/Q4/Q6/Q8), enabling trade-offs between disk/RAM footprint and latency while exposing the model's very long 128k token context window for long-document and multi-step reasoning tasks. ([huggingface.co](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)) The distilled checkpoint is derived from DeepSeek-R1 (which itself was distilled from Qwen2.5-14B) and is tuned specifically for chain-of-thought and step-by-step reasoning behavior, making it suited for math, code, and multi-step QA applications that benefit from explicit intermediate reasoning. The community GGUF builds are prepared to run on local setups via llama.cpp / llama-cpp-python / text-generation-webui, and the model card provides usage recommendations and distilled evaluation results for the 14B family. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))

Model Statistics

  • Downloads: 4,785
  • Likes: 38
  • Pipeline: text-generation

Model Details

Architecture and lineage: the GGUF artifacts are distilled from DeepSeek-R1-Distill-Qwen-14B (a Qwen2.5-derived 14B-class distilled checkpoint) and the community page lists the architecture as qwen2. The community GGUF package reports a model size class near 15B parameters for the quant builds. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Context and tuning: the model supports an extended 128k token context window (n_ctx ≈ 131,072) and is tuned for reasoning and chain-of-thought (CoT) style outputs. The DeepSeek project documents evaluation and guideline recommendations for prompt design (for example: prefer user prompts over system prompts and recommended sampling settings) to get consistent reasoning outputs. ([huggingface.co](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)) Quantization and compatibility: the GGUF release includes multiple quantization levels (Q3_K_L, Q4_K_M, Q6_K, Q8_0) with reported disk/RAM footprints (e.g., Q4_K_M ≈ 8.99 GB, Q6_K ≈ 12.1 GB, Q8_0 ≈ 15.7 GB) to help choose the right trade-off for CPU-only or accelerated inference. The GGUF conversion credits an author (bartowski) and references llama.cpp (notably a llama.cpp release b4514) as the basis for the quantized builds. ([huggingface.co](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF))

Key Features

  • Supports very long 128k token context windows for long-document reasoning tasks.
  • Multiple GGUF quant variants (Q3/Q4/Q6/Q8) to trade off memory and latency.
  • Distilled and tuned for chain-of-thought and multi-step reasoning tasks.
  • Prepared for local inference via llama.cpp and llama-cpp-python toolchains.
  • Community-built GGUF conversion credits and llama.cpp optimizations for speed.

Example Usage

Example (python):

from llama_cpp import Llama

# Replace this path with the path to the downloaded GGUF file (Q4/Q6/Q8, etc.)
model_path = "/path/to/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf"

llm = Llama(model_path=model_path, n_ctx=131072)

prompt = (
    "You are a step-by-step reasoning assistant. Please show chain-of-thought and final answer.\n"
    "Question: A staircase has 12 steps. If you take two steps at a time, how many moves to reach the top? Show reasoning."
)

resp = llm.create_completion(prompt=prompt, max_tokens=256, temperature=0.0)
print(resp['choices'][0]['text'])

# Notes:
# - Ensure your llama.cpp / llama-cpp-python build supports large n_ctx values.
# - Choose a GGUF quant that fits your RAM/GPU budget (Q4/Q6 recommended for mid-range systems).

Pricing

Free to download and use under the DeepSeek model's MIT license (DeepSeek-R1 base model). Community GGUF builds are distributed on Hugging Face with no commercial pricing listed.

Benchmarks

Downloads (last month): 4,785 (Source: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)

Model size (reported class): 15B parameters (GGUF artifacts reported on HF) (Source: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)

Quantized footprint examples: Q3_K_L 7.92 GB; Q4_K_M 8.99 GB; Q6_K 12.1 GB; Q8_0 15.7 GB (Source: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)

Distilled model evaluation — AIME 2024 (pass@1): 69.7 (DeepSeek-R1-Distill-Qwen-14B) (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

Distilled model evaluation — MATH-500 (pass@1): 93.9 (DeepSeek-R1-Distill-Qwen-14B) (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool