DeepSeek-R1 - AI RAG & Search Tool
Overview
DeepSeek‑R1 is an open, reasoning‑focused large language model family released by DeepSeek that emphasizes explicit chain‑of‑thought (CoT) reasoning, long context, and distilled smaller variants for practical use. The project publishes full weights under an MIT license and provides both downloadable checkpoints (R1 and R1‑Zero) and a hosted, OpenAI‑compatible API, making it usable for research, product integrations, and self‑hosting. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1)) Technically positioned as a Mixture‑of‑Experts (MoE) reasoning engine, R1 was trained with a multi‑stage pipeline that includes reinforcement learning to encourage long-form reasoning behaviors and later supervised fine‑tuning and distillation to produce practical smaller models. DeepSeek has pushed minor updates and published the updated R1 on Hugging Face; the company also exposes a “thinking” mode in its API that returns both the internal reasoning trace and the final answer for inspection and downstream processing. Community reaction has been strong but mixed — users praise its coding and math performance while reporting availability, hallucination, and censorship concerns in some deployments. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1))
Model Statistics
- Downloads: 946,257
- Likes: 13103
- Pipeline: text-generation
- Parameters: 684.5B
License: mit
Model Details
Architecture and sizes: DeepSeek‑R1 is implemented as a MoE model with ~671B total parameters and ~37B activated parameters during inference (the MoE routing activates a subset of experts), and the released models support a 128K token context window. Distilled dense variants (1.5B–70B) were produced from R1 outputs and are published for easier local deployment. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1)) Training & capability notes: R1 and R1‑Zero were developed using a pipeline that emphasizes large‑scale reinforcement learning to discover chain‑of‑thought patterns (R1‑Zero) and then adds cold‑start / supervised seeds to improve readability and alignment (R1). Distillation produces smaller dense models (e.g., Qwen/Llama‑based checkpoints) that retain much of the reasoning quality for math, code, and QA tasks. The Hugging Face model card documents benchmark results (MMLU, MATH, Codeforces, etc.), recommended inference settings (temperature ~0.6, avoid separate system prompt for best results), and warns that Hugging Face Transformers may not directly support the full R1 weights; vLLM and other inference backends are recommended for distilled checkpoints. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-R1)) Deployment & API: DeepSeek offers an OpenAI‑compatible API (base URL: https://api.deepseek.com) with two primary model IDs — deepseek‑chat (non‑thinking mode) and deepseek‑reasoner (thinking mode that returns both reasoning_content and final content). The API supports JSON output, tool calls (on chat mode), large context lengths, and explicit reasoning traces for analysis. ([api-docs.deepseek.com](https://api-docs.deepseek.com/quick_start/pricing/))
Key Features
- Explicit chain‑of‑thought (CoT) 'thinking' mode that returns internal reasoning traces.
- Mixture‑of‑Experts MoE core: 671B total params, 37B activated during inference.
- Very long context support — models published with up to 128K token windows.
- Open weights and permissive MIT license for commercial and research use on HF.
- Distilled dense variants (1.5B–70B) for local deployment and lower‑cost inference.
- OpenAI‑compatible REST API with JSON output, function/tool calling, and caching.
- Training pipeline emphasizes RL‑first discovery of reasoning followed by SFT.
Example Usage
Example (python):
import os
import requests
# Example: call DeepSeek's OpenAI-compatible chat API in reasoning mode
API_KEY = os.environ.get("DEEPSEEK_API_KEY")
BASE_URL = "https://api.deepseek.com"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
payload = {
"model": "deepseek-reasoner", # thinking/reasoning mode
"messages": [
{"role": "user", "content": "Solve: What is the sum of the first five prime numbers? Please show your reasoning."}
],
"max_tokens": 2048,
"stream": False
}
resp = requests.post(f"{BASE_URL}/chat/completions", json=payload, headers=headers)
resp.raise_for_status()
data = resp.json()
# The reasoning model returns both the chain-of-thought and the final answer fields
# (field names shown in DeepSeek docs: 'reasoning_content' and 'content')
choice = data.get("choices", [])[0]
output = choice.get("message", {})
final_answer = output.get("content")
reasoning = output.get("reasoning_content")
print("--- Reasoning trace ---")
print(reasoning)
print("\n--- Final answer ---")
print(final_answer)
# Note: this example follows DeepSeek's official API shape and is compatible with
# OpenAI-style clients if the base_url and model names are adjusted accordingly.
# See DeepSeek docs for streaming, tool-calling, and JSON-output examples. (API docs cited.) Pricing
DeepSeek publishes per‑token API pricing for its hosted models (DeepSeek‑V3.2 family). Per the official DeepSeek API docs (DeepSeek‑V3.2 / V3.2‑Exp): input tokens are priced at $0.028 per 1M tokens (cache hit) and $0.28 per 1M tokens (cache miss); output tokens are $0.42 per 1M tokens. The R1 weights themselves are released under an MIT license for self‑hosting, but hosted API pricing and availability are documented on DeepSeek's site and may change. ([api-docs.deepseek.com](https://api-docs.deepseek.com/quick_start/pricing/))
Benchmarks
MMLU (Pass@1): 90.8 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1)
MMLU‑Redux (Exact Match): 92.9 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1)
MATH‑500 (Pass@1): 97.3 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1)
Codeforces (Rating equivalent): 2029 (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1)
Context window (maximum supported): 128K tokens (models published with 128K context support) (Source: https://huggingface.co/deepseek-ai/DeepSeek-R1)
Key Information
- Category: RAG & Search
- Type: AI RAG & Search Tool