lqtrung1998 commited on
Commit
9c35d46
1 Parent(s): fb9d0a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -1,3 +1,84 @@
1
  ---
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
  ---
4
+ # ReFT: Reasoning with REinforced Fine-Tuning
5
+ Paper: https://arxiv.org/pdf/2401.08967.pdf
6
+
7
+ Repo: https://github.com/lqtrung1998/mwp_ReFT (under [Apache2.0 License](https://github.com/lqtrung1998/mwp_ReFT/blob/main/License.txt))
8
+
9
+ ## Introduction
10
+ We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
11
+
12
+ This repository contains:
13
+ - A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/Codellama-7b-hf-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/Codellama-7b-hf-SFT-warmup-GSM8k)
14
+ - A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/Codellama-7b-hf-SFT-GSM8k](https://huggingface.co/lqtrung1998/Codellama-7b-hf-SFT-GSM8k)
15
+ - A Rerank model that can score the fine-tuned SFT model output: [lqtrung1998/Codellama-7b-hf-SFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/Codellama-7b-hf-SFT-Rerank-GSM8k)
16
+ - A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/Codellama-7b-hf-ReFT-GSM8k](https://huggingface.co/lqtrung1998/Codellama-7b-hf-ReFT-GSM8k)
17
+ - A Rerank model that can score the fine-tuned ReFT model output: [lqtrung1998/Codellama-7b-hf-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/Codellama-7b-hf-ReFT-Rerank-GSM8k)
18
+
19
+ Note: Our models are tuned based on Codellama, thus, licenses applicable to Codellama, such as [Llama license](https://ai.meta.com/resources/models-and-libraries/llama-downloads/), also hold on these models
20
+
21
+ | | Top-1 | Voting@100 | Rerank@100 |
22
+ |--------------------------------------------------------------------|:------:|:----------:|:----------:|
23
+ | Codellama-7b-hf-SFT-warmup-GSM8k | 63.00 | - | - |
24
+ | Codellama-7b-hf-SFT-GSM8k<br>(+Codellama-7b-hf-SFT-Rerank-GSM8k) | 63.68 | 68.0 | 77.0 |
25
+ | Codellama-7b-hf-ReFT-GSM8k<br>(+Codellama-7b-hf-ReFT-Rerank-GSM8k) | 75.28 | 78.0 | 81.2 |
26
+
27
+
28
+ ## Training Data
29
+ The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)
30
+
31
+ ## Training Procedure
32
+ Check out our paper and repo for complete details.
33
+ #### ReFT model
34
+ ReFT model is warm-up via Supervised Fine-tuning using GSM8k Python SDP training data for 2 epochs then it is REinforced Fine-tuned for 300 epochs using questions in GSM8k training set.
35
+ #### Rerank model
36
+ Rerank model is trained to classify if the output CoT is correct or not using sampling data of ReFT model after 2 epochs warm-up.
37
+
38
+ ## Evaluation Results
39
+ See evaluations results of the models at table 4 of the research paper.
40
+
41
+ ## Usage
42
+ You can use the models through Huggingface's Transformers library or follow scripts in our repo.
43
+
44
+ Prompt format:
45
+ ```python
46
+ Question:
47
+ Weng earns $12 an hour for babysitting. Yesterday, she
48
+ just did 50 minutes of babysitting. How much did she earn?
49
+ Answer reasoning:
50
+ ```
51
+ Expected response:
52
+ ```python
53
+ def solution():
54
+ """Weng earns $12 an hour for babysitting. Yesterday, she just did
55
+ 50 minutes of babysitting. How much did she earn?"""
56
+ hourly_rate = 12
57
+ minutes_worked = 50
58
+ hours_worked = minutes_worked / 60
59
+ earnings = hourly_rate * hours_worked
60
+ result = earnings
61
+ return result
62
+ ```
63
+
64
+ ## Citation
65
+ Please cite the paper if you use our data, model or code.
66
+ ```
67
+ @misc{luong2024reft,
68
+ title={ReFT: Reasoning with Reinforced Fine-Tuning},
69
+ author={Trung Quoc Luong and Xinbo Zhang and Zhanming Jie and Peng Sun and Xiaoran Jin and Hang Li},
70
+ year={2024},
71
+ eprint={2401.08967},
72
+ archivePrefix={arXiv},
73
+ primaryClass={cs.CL}
74
+ }
75
+ ```
76
+ ## Intended Use
77
+ Intended Use Cases Code Llama and its variants is intended for commercial and research use in English and relevant programming languages. The base model Code Llama can be adapted for a variety of code synthesis and understanding tasks, Code Llama - Python is designed specifically to handle the Python programming language, and Code Llama - Instruct is intended to be safer to use for code assistant and generation applications.
78
+
79
+ Out-of-Scope Uses Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.
80
+
81
+ ## Ethical Considerations and Limitations
82
+ Code Llama and its variants are a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Code Llama’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Code Llama, developers should perform safety testing and tuning tailored to their specific applications of the model.
83
+
84
+ Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-use-guide.