DeepSeek-R1 Distill Qwen 14B GGUF - AI Language Models Tool
Overview
DeepSeek-R1 Distill Qwen 14B GGUF is a community-provided, GGUF-quantized build of DeepSeek's distilled reasoning model (14B) optimized for local inference via llama.cpp and compatible runtimes. The variant available from the LM Studio community was prepared to make DeepSeek's reasoning-focused checkpoint easier to run on CPU/GPU setups that support GGUF; the quantization and GGUF packaging were contributed by community maintainers and exposed through a Hugging Face community model page. ([huggingface.co](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)) The model is a distilled student of DeepSeek-R1 (originally derived from Qwen2.5-14B) and is explicitly tuned for chain-of-thought and multi-step reasoning tasks, while supporting very long contexts (up to 128k tokens in the DeepSeek family). The GGUF variant lists multiple quantization options (3-/4-/6-/8-bit) to trade off memory and speed, making it practical for local experimentation and deployment with llama.cpp, ctransformers, or llama-cpp-python. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))
Model Statistics
- Downloads: 8,807
- Likes: 39
- Pipeline: text-generation
Model Details
Base and lineage: the distilled model was derived from DeepSeek-R1 (which itself leverages DeepSeek-V3 and Qwen2.5 family components). The DeepSeek project applied a reinforcement-learning (RL)-centric pipeline to discover and incentivize chain-of-thought reasoning behaviors, then generated reasoning data used to distill smaller student models — including the Qwen-14B distilled checkpoint. The DeepSeek model card documents the RL/distillation pipeline and evaluation methodology. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)) Architecture & sizing: the Hugging Face model pages list the distilled Qwen-14B variant as a 15B-parameter model in practice (the distilled checkpoint metadata and quantized GGUF builds report ~15B params). The GGUF distribution offered by the LM Studio community exposes multiple quantization formats (Q3_K_L, Q4_K_M, Q6_K, Q8_0) with corresponding compressed sizes (for example Q4_K_M ≈ 8.99 GB, Q3_K_L ≈ 7.92 GB). These quantizations were prepared with llama.cpp tooling. ([huggingface.co](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)) Capabilities & recommended usage: the distilled Qwen-14B is tuned for reasoning, step-by-step math, and code reasoning tasks. DeepSeek's documentation recommends prompting for explicit chain-of-thought (e.g., "please reason step-by-step") and suggests generation settings (temperature ~0.6, top-p ~0.95) and avoiding a separate system prompt to obtain best reasoning behavior. The model supports very long context lengths in the DeepSeek family (128k token capability is reported for R1-family models). ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B))
Key Features
- Tuned for chain-of-thought and multi-step reasoning tasks (DeepSeek RL-distillation pipeline).
- Supports very long contexts in the DeepSeek family (reported 128k token capability).
- Provided as GGUF quantized builds for llama.cpp compatibility and local inference.
- Multiple quantization levels available (Q3_K_L, Q4_K_M, Q6_K, Q8_0) for memory/speed tradeoffs.
- Distilled from DeepSeek-R1 using reasoning samples (documentation cites ~800k finetuning samples).
Example Usage
Example (python):
from ctransformers import AutoModelForCausalLM
# Example: load a GGUF-quantized model from Hugging Face (local copy or repo name).
# Replace model_file with the actual .gguf filename you downloaded from the repo.
model = AutoModelForCausalLM.from_pretrained(
"lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF",
model_file="deepseek-r1-distill-qwen-14b-q4_k_m.gguf",
model_type="llama", # ctransformers uses 'llama' type for llama.cpp-style GGUFs
gpu_layers=0 # set >0 to offload layers to GPU if available
)
prompt = (
"Solve step-by-step: If f(x)=2x+3 and g(x)=x^2, what is f(g(3))?\n"
"Please show chain-of-thought reasoning."
)
# Call the model (ctransformers returns a string or structured output depending on version)
output = model(prompt, max_new_tokens=256, temperature=0.6, top_p=0.95)
print(output)
# Alternative: use llama-cpp-python (llama_cpp) if you prefer that binding; both libraries support GGUF models.
# See llama-cpp-python or ctransformers documentation for GPU build and server examples. ([huggingface.co](https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF?utm_source=openai)) Benchmarks
AIME 2024 (pass@1): 69.7% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
MATH-500 (pass@1): 93.9% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
LiveCodeBench (pass@1, COT): 53.1% (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Codeforces (rating equivalent): 1481 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
DeepSeek-R1-Distill-Qwen-14B (GGUF) downloads last month (LM Studio community page): 8,807 (Source: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF)
DeepSeek-R1-Distill-Qwen-14B (original repo) downloads last month: 737,135 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Key Information
- Category: Language Models
- Type: AI Language Models Tool