lqtrung1998 commited on
Commit
41d91ee
1 Parent(s): cc999b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md CHANGED
@@ -1,3 +1,74 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
4
+ # ReFT: Reasoning with REinforced Fine-Tuning
5
+ Paper: https://arxiv.org/pdf/2401.08967.pdf
6
+
7
+ Repo: https://github.com/lqtrung1998/mwp_ReFT (under [Apache2.0 License](https://github.com/lqtrung1998/mwp_ReFT/blob/main/License.txt))
8
+
9
+ ## Introduction
10
+ We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
11
+
12
+ This repository contains:
13
+ - A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k)
14
+ - A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-GSM8k)
15
+ - A Rerank model that can score the fine-tuned SFT model output: [lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k)
16
+ - A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-ReFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-GSM8k)
17
+ - A Rerank model that can score the fine-tuned ReFT model output: [lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k)
18
+
19
+ Note: Our models are tuned based on Galactica, thus, licenses applicable to Galactica, such as non-commercial CC BY-NC 4.0 license also hold on these models.
20
+
21
+ | | Top-1 | Voting@100 | Rerank@100 |
22
+ |--------------------------------------------------------------------|:------:|:----------:|:----------:|
23
+ | galactica-6.7b-SFT-warmup-GSM8k | 48.37 | - | - |
24
+ | galactica-6.7b-SFT-GSM8k<br>(+galactica-6.7b-SFT-Rerank-GSM8k) | 58.83 | 62.9 | 73.4 |
25
+ | galactica-6.7b-ReFT-GSM8k<br>(+galactica-6.7b-ReFT-Rerank-GSM8k) | 68.91 | 71.9 | 76.4 |
26
+
27
+ ## Training Data
28
+ The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)
29
+
30
+ ## Training Procedure
31
+ Check out our paper and repo for complete details.
32
+ #### ReFT model
33
+ ReFT model is warm-up via Supervised Fine-tuning using GSM8k Python SDP training data for 2 epochs then it is REinforced Fine-tuned for 300 epochs using questions in GSM8k training set.
34
+ #### Rerank model
35
+ Rerank model is trained to classify if the output CoT is correct or not using sampling data of ReFT model after 2 epochs warm-up.
36
+
37
+ ## Evaluation Results
38
+ See evaluations results of the models at table 4 of the research paper.
39
+
40
+ ## Usage
41
+ You can use the models through Huggingface's Transformers library or follow scripts in our repo.
42
+
43
+ Prompt format:
44
+ ```python
45
+ Question:
46
+ Weng earns $12 an hour for babysitting. Yesterday, she
47
+ just did 50 minutes of babysitting. How much did she earn?
48
+ Answer reasoning:
49
+ ```
50
+ Expected response:
51
+ ```python
52
+ def solution():
53
+ """Weng earns $12 an hour for babysitting. Yesterday, she just did
54
+ 50 minutes of babysitting. How much did she earn?"""
55
+ hourly_rate = 12
56
+ minutes_worked = 50
57
+ hours_worked = minutes_worked / 60
58
+ earnings = hourly_rate * hours_worked
59
+ result = earnings
60
+ return result
61
+ ```
62
+
63
+ ## Citation
64
+ Please cite the paper if you use our data, model or code.
65
+ ```
66
+ @misc{luong2024reft,
67
+ title={ReFT: Reasoning with Reinforced Fine-Tuning},
68
+ author={Trung Quoc Luong and Xinbo Zhang and Zhanming Jie and Peng Sun and Xiaoran Jin and Hang Li},
69
+ year={2024},
70
+ eprint={2401.08967},
71
+ archivePrefix={arXiv},
72
+ primaryClass={cs.CL}
73
+ }
74
+ ```