deepseek-ai
/

DeepSeek-Prover-V1.5-RL

Safetensors

llama

Model card Files Files and versions Community

zqh11 commited on Aug 16

Commit

944fec2

•

1 Parent(s): 1982f70

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -63

README.md CHANGED Viewed

@@ -43,14 +43,11 @@ license_link: LICENSE
   </a>
 </div>
 <p align="center">
-  <a href="#3-evaluation-results">Evaluation Results</a> |
   <a href="#3-model-downloads">Model Download</a> |
-  <a href="#4-setup-environment">Setup Environment</a> |
-  <a href="#5-quick-start">Quick Start</a> |
-  <a href="#6-questions-and-bugs">Questions and Bugs</a> |
-  <a href="#7-license">License</a> |
-  <a href="#8-citation">Citation</a> |
-  <a href="#9-contact">Contact</a>
 </p>
@@ -66,7 +63,7 @@ license_link: LICENSE
 We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level miniF2F benchmark (63.5%) and the undergraduate level ProofNet benchmark (25.3%).
 <p align="center">
-  <img width="100%" src="figures/performance.png">
 </p>
@@ -102,64 +99,12 @@ We release the DeepSeek-Prover-V1.5 with 7B parameters, including base, SFT and
 </div>
-## 4. Setup Environment
-### Requirements
-* Supported platform: Linux
-* Python 3.10
-### Installation
-1. **Install Lean 4**
-   Follow the instructions on the [Lean 4 installation page](https://leanprover.github.io/lean4/doc/quickstart.html) to set up Lean 4.
-2. **Clone the repository**
-```sh
-git clone --recurse-submodules [email protected]:deepseek-ai/DeepSeek-Prover-V1.5.git
-cd DeepSeek-Prover-V1.5
-```
-3. **Install Dependencies**
-```sh
-pip install -r requirements.txt
-```
-4. **Build Mathlib4**
-```sh
-cd mathlib4
-lake build
-```
-## 5. Quick Start
-You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference. A simple example of generating a proof for a problem from miniF2F and verifying it can be found in [quick_start.py](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5/blob/master/quick_start.py).
-To run paper experiments, you can use the following script to launch a RMaxTS proof search agent:
-```sh
-python -m prover.launch --config=configs/RMaxTS.py --log_dir=logs/RMaxTS_results
-```
-You can use `CUDA_VISIBLE_DEVICES=0,1,···` to specify the GPU devices. The experiment results can be gathered using the following script:
-```sh
-python -m prover.summarize --config=configs/RMaxTS.py --log_dir=logs/RMaxTS_results
-```
-## 6. Questions and Bugs
-* For general questions and discussions, please use [GitHub Discussions](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5/discussions).
-* To report a potential bug, please open an issue.
-## 7. License
 This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.
 See the [LICENSE-CODE](LICENSE-CODE) and [LICENSE-MODEL](LICENSE-MODEL) for more details.
-## 8. Citation
 ```latex
 @article{xin2024deepseekproverv15harnessingproofassistant,
       title={DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search},
@@ -172,6 +117,6 @@ See the [LICENSE-CODE](LICENSE-CODE) and [LICENSE-MODEL](LICENSE-MODEL) for more
 }
 ```
-## 9. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).

   </a>
 </div>
 <p align="center">
+  <a href="#2-evaluation-results">Evaluation Results</a> |
   <a href="#3-model-downloads">Model Download</a> |
+  <a href="#4-license">License</a> |
+  <a href="#5-citation">Citation</a> |
+  <a href="#6-contact">Contact</a>
 </p>
 We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level miniF2F benchmark (63.5%) and the undergraduate level ProofNet benchmark (25.3%).
 <p align="center">
+  <img width="100%" src="https://github.com/deepseek-ai/DeepSeek-Prover-V1.5/blob/main/figures/performance.png?raw=true">
 </p>
 </div>
+## 4. License
 This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.
 See the [LICENSE-CODE](LICENSE-CODE) and [LICENSE-MODEL](LICENSE-MODEL) for more details.
+## 5. Citation
 ```latex
 @article{xin2024deepseekproverv15harnessingproofassistant,
       title={DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search},
 }
 ```
+## 6. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).