Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
|
|
16 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/hbin0701/DPO/runs/8aboqbe6)
|
17 |
# mistral_7b_gsm8k_ep2_1e-5_dpo
|
18 |
|
19 |
-
This model is a fine-tuned version of [/home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1](https://huggingface.co//home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1) on the
|
20 |
-
It achieves the following results on the evaluation set:
|
21 |
- Loss: 0.0005
|
22 |
- Rewards/chosen: -1.7120
|
23 |
- Rewards/rejected: -14.3548
|
|
|
16 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/hbin0701/DPO/runs/8aboqbe6)
|
17 |
# mistral_7b_gsm8k_ep2_1e-5_dpo
|
18 |
|
19 |
+
This model is a fine-tuned version of [/home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1](https://huggingface.co//home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1) on the GSM8K Train Set.
|
20 |
+
It achieves the following results on the evaluation set (=GSM8K Train subset):
|
21 |
- Loss: 0.0005
|
22 |
- Rewards/chosen: -1.7120
|
23 |
- Rewards/rejected: -14.3548
|