lqtrung1998
/

galactica-6.7b-SFT-Rerank-GSM8k

 ---
 license: cc-by-nc-4.0
 ---
+# ReFT: Reasoning with REinforced Fine-Tuning
+Paper: https://arxiv.org/pdf/2401.08967.pdf
+Repo: https://github.com/lqtrung1998/mwp_ReFT (under [Apache2.0 License](https://github.com/lqtrung1998/mwp_ReFT/blob/main/License.txt))
+## Introduction
+We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
+This repository contains:
+- A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k)
+- A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-GSM8k)
+- A Rerank model that can score the fine-tuned SFT model output: [lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k)
+- A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-ReFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-GSM8k)
+- A Rerank model that can score the fine-tuned ReFT model output: [lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k)
+Note: Our models are tuned based on Galactica, thus, licenses applicable to Galactica, such as non-commercial CC BY-NC 4.0 license also hold on these models.
+|                                                                    |  Top-1 | Voting@100 | Rerank@100 |
+|--------------------------------------------------------------------|:------:|:----------:|:----------:|
+| galactica-6.7b-SFT-warmup-GSM8k                                    |  48.37 |      -     |      -     |
+| galactica-6.7b-SFT-GSM8k<br>(+galactica-6.7b-SFT-Rerank-GSM8k)     | 58.83  |    62.9    |    73.4    |
+| galactica-6.7b-ReFT-GSM8k<br>(+galactica-6.7b-ReFT-Rerank-GSM8k)   |  68.91 |    71.9    |    76.4    |
+## Training Data
+The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)
+## Training Procedure
+Check out our paper and repo for complete details.
+#### ReFT model
+ReFT model is warm-up via Supervised Fine-tuning using GSM8k Python SDP training data for 2 epochs then it is REinforced Fine-tuned for 300 epochs using questions in GSM8k training set.
+#### Rerank model
+Rerank model is trained to classify if the output CoT is correct or not using sampling data of ReFT model after 2 epochs warm-up.
+## Evaluation Results
+See evaluations results of the models at table 4 of the research paper.
+## Usage
+You can use the models through Huggingface's Transformers library or follow scripts in our repo.
+Prompt format:
+```python
+Question:
+Weng earns $12 an hour for babysitting. Yesterday, she
+just did 50 minutes of babysitting. How much did she earn?
+Answer reasoning:
+```
+Expected response:
+```python
+def solution():
+  """Weng earns $12 an hour for babysitting. Yesterday, she just did
+  50 minutes of babysitting. How much did she earn?"""
+  hourly_rate = 12
+  minutes_worked = 50
+  hours_worked = minutes_worked / 60
+  earnings = hourly_rate * hours_worked
+  result = earnings
+  return result
+```
+## Citation
+Please cite the paper if you use our data, model or code.
+```
+@misc{luong2024reft,
+      title={ReFT: Reasoning with Reinforced Fine-Tuning},
+      author={Trung Quoc Luong and Xinbo Zhang and Zhanming Jie and Peng Sun and Xiaoran Jin and Hang Li},
+      year={2024},
+      eprint={2401.08967},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```