DeepSeek-Prover-V1.5-RL - AI Language Models Tool
Overview
DeepSeek-Prover-V1.5-RL is an open-source language model specialized for formal theorem proving in Lean 4. The model builds on a DeepSeekMath-Base backbone and is fine-tuned with supervised data followed by reinforcement learning from proof assistant feedback (RLPAF). It also integrates a novel Monte‑Carlo tree search variant called RMaxTS to encourage exploration of diverse tactic-state trajectories, improving discovery of nontrivial proofs. ([arxiv.org](https://arxiv.org/abs/2408.08152)) In evaluations reported by the authors, DeepSeek-Prover-V1.5-RL achieves state-of-the-art formal proof success on standard benchmarks (miniF2F and ProofNet) while remaining available as a 7B-parameter open model distributed in safetensors (BF16) format on Hugging Face. The project provides scripts and an example quick-start that integrate the model with Lean 4 for verification and for launching RMaxTS search experiments. ([arxiv.org](https://arxiv.org/abs/2408.08152))
Model Statistics
- Downloads: 14,313
- Likes: 64
- Parameters: 6.9B
License: other
Model Details
Architecture and scale: DeepSeek-Prover-V1.5-RL is a 7B-parameter transformer-style causal language model specialized for formal mathematics and Lean 4 tactic scripts; model files on Hugging Face are provided as safetensors with BF16 tensor type. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL?utm_source=openai)) Training & algorithms: the development uses a two-stage pipeline—supervised fine-tuning (SFT) on curated formal-proof data, then reinforcement learning from proof assistant feedback (RLPAF). For inference and proof search, the authors pair the model with RMaxTS, a Monte‑Carlo tree search variant that mixes extrinsic proof success rewards with an intrinsic novelty reward for visiting previously unseen tactic states. The model supports both whole-proof (single-pass) generation and stepwise, tactic-state conditioned generation that interleaves model outputs with Lean 4 verification. ([arxiv.org](https://arxiv.org/abs/2408.08152)) System & usage notes: the repository includes quick-start scripts, examples for launching RMaxTS experiments, and instructions to build mathlib4 and run Lean 4 locally. The codebase is released under an MIT-style code license; the model files are hosted on Hugging Face and intended for research and engineering integration with Lean 4 pipelines. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai))
Key Features
- Reinforcement learning from proof assistant feedback (RLPAF) to optimize verified proof outcomes.
- RMaxTS: MCTS variant with intrinsic novelty reward for diverse tactic-state exploration.
- Tactic-state conditioning: stepwise generation that reads Lean 4 proof state for context.
- Supports whole-proof (single-pass) and truncate-and-resume stepwise proof generation modes.
- Open-source distribution with quick-start scripts for Lean 4 integration and evaluation.
Example Usage
Example (python):
## Minimal example: load the model and generate a candidate Lean 4 proof
# Note: follow the project's Quick Start to install Lean 4 and mathlib4 for verification.
# See the repo quick_start and Hugging Face model page for full examples. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai))
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_ID = "deepseek-ai/DeepSeek-Prover-V1.5-RL"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16, # model artifacts provided as BF16 on HF
device_map="auto",
trust_remote_code=True
)
# Example prompt: a small Lean 4 theorem statement (replace with a real miniF2F problem)
prompt = "-- Lean 4 theorem\nexample (n : Nat) : n = n := by\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate continuation (adjust generation params as needed)
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.8,
top_p=0.95,
num_return_sequences=4
)
for i, out in enumerate(outputs):
text = tokenizer.decode(out, skip_special_tokens=True)
print(f"--- Candidate {i+1} ---")
print(text)
# To verify candidate proofs, run the generated script through a local Lean 4 environment per the project's quick start. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai)) Benchmarks
miniF2F (test set) pass rate: 63.5% (Source: https://arxiv.org/abs/2408.08152)
ProofNet pass rate: 25.3% (Source: https://arxiv.org/abs/2408.08152)
Model size: 7B parameters (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
File format / tensor type: safetensors / BF16 (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
Hugging Face downloads (recent): ≈14k (last month on HF listing) (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)
Key Information
- Category: Language Models
- Type: AI Language Models Tool