DeepSeek-Prover-V1.5-RL - AI Language Models Tool

Overview

DeepSeek-Prover-V1.5-RL is an open-source language model specialized for formal theorem proving in Lean 4. Building on the DeepSeekMath foundation and the earlier DeepSeek-Prover family, V1.5 introduces reinforcement learning from proof-assistant feedback (RLPAF) and an exploration-driven Monte‑Carlo tree search variant (RMaxTS) to produce diverse, verifiable proof candidates. The authors released base, supervised-fine-tuned (SFT) and RL-refined checkpoints and provide tooling to integrate model generations with the Lean 4 verifier via a truncate-and-resume pipeline. The model is aimed at automated formal proof generation and research in neural-symbolic reasoning: it accepts Lean 4 problem statements and generates Lean tactic-based proofs which are verified by the Lean prover. According to the authors and the public model card, DeepSeek-Prover-V1.5-RL (≈7B parameters) sets new state-of-the-art results on prominent formal-proving benchmarks (miniF2F and ProofNet) when combined with RMaxTS, while remaining available under DeepSeek’s model license and with source code and training/inference recipes published on the project’s GitHub and Hugging Face model card.

Model Statistics

  • Downloads: 657
  • Likes: 64
  • Parameters: 6.9B

License: other

Model Details

Core design and training pipeline: DeepSeek-Prover-V1.5 is derived from DeepSeekMath-Base and released as three main checkpoints: Base, SFT (supervised fine-tuned) and RL (reinforcement-learned). Training follows a three-stage workflow: continued mathematical pre-training, supervised fine-tuning on a large Lean4 code‑completion dataset augmented with chain-of-thought comments and tactic-state annotations, and a reinforcement learning phase (GRPO-style) that uses binary pass/fail signals from the Lean 4 verifier as reward (the authors call this RLPAF). Inference and algorithmic advances: the paper and model card describe a truncate-and-resume mechanism that bridges whole-proof generation and proof-step verification. When a generated whole-proof fails, the code is truncated to the first failing tactic and the model resumes conditioned on the latest tactic-state comment; this mechanism integrates with Monte‑Carlo tree search. The authors propose RMaxTS, a variant of MCTS that assigns intrinsic (curiosity-like) rewards to encourage diverse exploration of tactic-state space, which substantially increases pass rates under a limited sampling budget. Technical specifics reported by the authors and model card: the released prover models are approximately 7B parameters (the public metadata lists 7B / ~6.9B), use BF16 tensor format for weights, and are trained/evaluated with a 4,096-token context. The repository includes examples, quantized variants, and instructions for running inference; code and issues are hosted on GitHub and the model card and paper provide reproducibility details (model card, GitHub, arXiv paper).

Key Features

  • Reinforcement learning from prover feedback (RLPAF) using Lean 4 verification signals
  • RMaxTS: intrinsic-reward-driven Monte‑Carlo tree search for diverse proof exploration
  • Truncate-and-resume pipeline combining whole-proof generation with tactic-state resumption
  • Pretraining on DeepSeekMath-Base and SFT with thought-augmented Lean 4 datasets
  • Open-source release with Base, SFT, and RL checkpoints and quantized variants

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Model id on Hugging Face
MODEL_ID = "deepseek-ai/DeepSeek-Prover-V1.5-RL"

# Install prerequisites: pip install transformers accelerate safetensors
# Load tokenizer and model (use bfloat16 if supported on your hardware)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Example Lean 4 theorem prompt (very short illustrative example)
prompt = "-- Lean 4 theorem statement\nexample (a b : Nat) : a + b = b + a :=\n"

inputs = tokenizer(prompt, return_tensors="pt").to(next(model.parameters()).device)

# Generate a candidate proof (tune max_new_tokens / sampling for your workload)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    top_k=50,
    temperature=0.8,
    num_return_sequences=3
)

for i, out in enumerate(outputs):
    text = tokenizer.decode(out, skip_special_tokens=True)
    print(f"--- Candidate {i+1} ---\n{text}\n")

# Note: the project provides integration tools to run Lean 4 verification and the
# truncate-and-resume / RMaxTS search loop. For full prover/verification loop, see
# the project's GitHub and model card for recommended scripts and dependencies.

Benchmarks

miniF2F (test) — with RMaxTS: 63.5% pass rate (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

miniF2F (test) — single-pass whole-proof generation (RL model): 60.2% pass rate (Source: https://arxiv.org/abs/2408.08152)

ProofNet (test) — with RMaxTS: 25.3% pass rate (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

Model size (parameters): ≈7B parameters (reported 7B / ~6.9B) (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool