Home › Language Models › DeepSeek-Prover-V1.5-RL

DeepSeek-Prover-V1.5-RL - AI Language Models Tool

Overview

DeepSeek-Prover-V1.5-RL is an open-source language model specialized for formal theorem proving in Lean 4. The model builds on a DeepSeekMath-Base backbone and is fine-tuned with supervised data followed by reinforcement learning from proof assistant feedback (RLPAF). It also integrates a novel Monte‑Carlo tree search variant called RMaxTS to encourage exploration of diverse tactic-state trajectories, improving discovery of nontrivial proofs. ([arxiv.org](https://arxiv.org/abs/2408.08152)) In evaluations reported by the authors, DeepSeek-Prover-V1.5-RL achieves state-of-the-art formal proof success on standard benchmarks (miniF2F and ProofNet) while remaining available as a 7B-parameter open model distributed in safetensors (BF16) format on Hugging Face. The project provides scripts and an example quick-start that integrate the model with Lean 4 for verification and for launching RMaxTS search experiments. ([arxiv.org](https://arxiv.org/abs/2408.08152))

Model Statistics

Downloads: 14,313
Likes: 64
Parameters: 6.9B

License: other

Model Details

Architecture and scale: DeepSeek-Prover-V1.5-RL is a 7B-parameter transformer-style causal language model specialized for formal mathematics and Lean 4 tactic scripts; model files on Hugging Face are provided as safetensors with BF16 tensor type. ([huggingface.co](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL?utm_source=openai)) Training & algorithms: the development uses a two-stage pipeline—supervised fine-tuning (SFT) on curated formal-proof data, then reinforcement learning from proof assistant feedback (RLPAF). For inference and proof search, the authors pair the model with RMaxTS, a Monte‑Carlo tree search variant that mixes extrinsic proof success rewards with an intrinsic novelty reward for visiting previously unseen tactic states. The model supports both whole-proof (single-pass) generation and stepwise, tactic-state conditioned generation that interleaves model outputs with Lean 4 verification. ([arxiv.org](https://arxiv.org/abs/2408.08152)) System & usage notes: the repository includes quick-start scripts, examples for launching RMaxTS experiments, and instructions to build mathlib4 and run Lean 4 locally. The codebase is released under an MIT-style code license; the model files are hosted on Hugging Face and intended for research and engineering integration with Lean 4 pipelines. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai))

Key Features

Reinforcement learning from proof assistant feedback (RLPAF) to optimize verified proof outcomes.
RMaxTS: MCTS variant with intrinsic novelty reward for diverse tactic-state exploration.
Tactic-state conditioning: stepwise generation that reads Lean 4 proof state for context.
Supports whole-proof (single-pass) and truncate-and-resume stepwise proof generation modes.
Open-source distribution with quick-start scripts for Lean 4 integration and evaluation.

Example Usage

Example (python):

## Minimal example: load the model and generate a candidate Lean 4 proof
# Note: follow the project's Quick Start to install Lean 4 and mathlib4 for verification.
# See the repo quick_start and Hugging Face model page for full examples. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai))

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_ID = "deepseek-ai/DeepSeek-Prover-V1.5-RL"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,    # model artifacts provided as BF16 on HF
    device_map="auto",
    trust_remote_code=True
)

# Example prompt: a small Lean 4 theorem statement (replace with a real miniF2F problem)
prompt = "-- Lean 4 theorem\nexample (n : Nat) : n = n := by\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate continuation (adjust generation params as needed)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.8,
    top_p=0.95,
    num_return_sequences=4
)

for i, out in enumerate(outputs):
    text = tokenizer.decode(out, skip_special_tokens=True)
    print(f"--- Candidate {i+1} ---")
    print(text)

# To verify candidate proofs, run the generated script through a local Lean 4 environment per the project's quick start. ([github.com](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5?utm_source=openai))

Benchmarks

miniF2F (test set) pass rate: 63.5% (Source: https://arxiv.org/abs/2408.08152)

ProofNet pass rate: 25.3% (Source: https://arxiv.org/abs/2408.08152)

Model size: 7B parameters (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

File format / tensor type: safetensors / BF16 (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

Hugging Face downloads (recent): ≈14k (last month on HF listing) (Source: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL)

Last Refreshed: 2026-02-24

HuggingFace

Key Information

Category: Language Models
Type: AI Language Models Tool

Visit Official Website

DeepSeek-Prover-V1.5-RL - AI Language Models Tool

Overview

Model Statistics

Model Details

Key Features

Example Usage

Benchmarks

Key Information

Related Tools

Qwen2.5-7B

DeepSeek-V3

Llama 3

UNfilteredAI-1B

Shuttle-3

WizardLM