hbin0701
/

Mistral_VStar_iter1

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

hbin0701 commited on May 22

Commit

7af0563

•

1 Parent(s): 024ecb0

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/hbin0701/DPO/runs/8aboqbe6)
 # mistral_7b_gsm8k_ep2_1e-5_dpo
-This model is a fine-tuned version of [/home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1](https://huggingface.co//home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.0005
 - Rewards/chosen: -1.7120
 - Rewards/rejected: -14.3548

 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/hbin0701/DPO/runs/8aboqbe6)
 # mistral_7b_gsm8k_ep2_1e-5_dpo
+This model is a fine-tuned version of [/home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1](https://huggingface.co//home/hyeonbin/self_train/Verifiers/models/mistral_7b_gsm8k_ep2_1e-5_rft_round1) on the GSM8K Train Set.
+It achieves the following results on the evaluation set (=GSM8K Train subset):
 - Loss: 0.0005
 - Rewards/chosen: -1.7120
 - Rewards/rejected: -14.3548