DeepSeek-Prover-V1.5-RL - AI Language Models Tool
Overview
DeepSeek-Prover-V1.5-RL is an open-source language model specialized for formal theorem proving in Lean 4. Building on the DeepSeekMath foundation and the earlier DeepSeek-Prover family, V1.5 introduces reinforcement learning from proof-assistant feedback (RLPAF) and an exploration-driven Monte‑Carlo tree search variant (RMaxTS) to produce diverse, verifiable proof candidates. The authors released base, supervised-fine-tuned (SFT) and RL-refined checkpoints and provide tooling to integrate model generations with the Lean 4 verifier via a truncate-and-resume pipeline. The model is aimed at automated formal proof generation and research in neural-symbolic reasoning: it accepts Lean 4 problem statements and generates Lean tactic-based proofs which are verified by the Lean prover. According to the authors and the public model card, DeepSeek-Prover-V1.5-RL (≈7B parameters) sets new state-of-the-art results on prominent formal-proving benchmarks (miniF2F and ProofNet) when combined with RMaxTS, while remaining available under DeepSeek’s model license and with source code and training/inference recipes published on the project’s GitHub and Hugging Face model card.
Model Statistics
- Downloads: 657
- Likes: 64
- Parameters: 6.9B
License: other
Model Details
Core design and training pipeline: DeepSeek-Prover-V1.5 is derived from DeepSeekMath-Base and released as three main checkpoints: Base, SFT (supervised fine-tuned) and RL (reinforcement-learned). Training follows a three-stage workflow: continued mathematical pre-training, supervised fine-tuning on a large Lean4 code‑completion dataset augmented with chain-of-thought comments and tactic-state annotations, and a reinforcement learning phase (GRPO-style) that uses binary pass/fail signals from the Lean 4 verifier as reward (the authors call this RLPAF). Inference and algorithmic advances: the paper and model card describe a truncate-and-resume mechanism that bridges whole-proof generation and proof-step verification. When a generated whole-proof fails, the code is truncated to the first failing tactic and the model resumes conditioned on the latest tactic-state comment; this mechanism integrates with Monte‑Carlo tree search. The authors propose RMaxTS, a variant of MCTS that assigns intrinsic (curiosity-like) rewards to encourage diverse exploration of tactic-state space, which substantially increases pass rates under a limited sampling budget. Technical specifics reported by the authors and model card: the released prover models are approximately 7B parameters (the public metadata lists 7B / ~6.9B), use BF16 tensor format for weights, and are trained/evaluated with a 4,096-token context. The repository includes examples, quantized variants, and instructions for running inference; code and issues are hosted on GitHub and the model card and paper provide reproducibility details (model card, GitHub, arXiv paper).
Key Features
- Reinforcement learning from prover feedback (RLPAF) using Lean 4 verification signals
- RMaxTS: intrinsic-reward-driven Monte‑Carlo tree search for diverse proof exploration
- Truncate-and-resume pipeline combining whole-proof generation with tactic-state resumption
- Pretraining on DeepSeekMath-Base and SFT with thought-augmented Lean 4 datasets
- Open-source release with Base, SFT, and RL checkpoints and quantized variants
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Model id on Hugging Face
MODEL_ID = "deepseek-ai/DeepSeek-Prover-V1.5-RL"
# Install prerequisites: pip install transformers accelerate safetensors
# Load tokenizer and model (use bfloat16 if supported on your hardware)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16,
device_map="auto",
trust_remote_code=True
)
# Example Lean 4 theorem prompt (very short illustrative example)
prompt = "-- Lean 4 theorem statement\nexample (a b : Nat) : a + b = b + a :=\n"
inputs = tokenizer(prompt, return_tensors="pt").to(next(model.parameters()).device)
# Generate a candidate proof (tune max_new_tokens / sampling for your workload)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
top_k=50,
temperature=0.8,
num_return_sequences=3
)
for i, out in enumerate(outputs):
text = tokenizer.decode(out, skip_special_tokens=True)
print(f"--- Candidate {i+1} ---\n{text}\n")
# Note: the project provides integration tools to run Lean 4 verification and the
# truncate-and-resume / RMaxTS search loop. For full prover/verification loop, see
# the project's GitHub and model card for recommended scripts and dependencies. Benchmarks
miniF2F (test) — with RMaxTS: 63.5% pass rate (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
miniF2F (test) — single-pass whole-proof generation (RL model): 60.2% pass rate (Source: https://arxiv.org/abs/2408.08152)
ProofNet (test) — with RMaxTS: 25.3% pass rate (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
Model size (parameters): ≈7B parameters (reported 7B / ~6.9B) (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
Key Information
- Category: Language Models
- Type: AI Language Models Tool