v000000
/

L3.1-Niitorm-8B-DPO-t0.0001

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

v000000 commited on Sep 19

Commit

5aa63ab

•

1 Parent(s): 11c7ff5

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -69,12 +69,11 @@ parameters:
 dtype: float16
 # Then, DPO Finetune
 ```
-*Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) for 1 epoch, 1.5e-5 learning rate, on Nvidia A100.
-*I used a higher learning rate and full dataset compared to "L3.1-Celestial-Stone-2x8B-DPO".*
 # Prompt Template:
 ```bash

 dtype: float16
 # Then, DPO Finetune
+# [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)
 ```
+*I used a higher learning rate and full dataset when training compared to my "L3.1-Celestial-Stone-2x8B-DPO". This caused lower loss and better adaption to the chosen style.*
 # Prompt Template:
 ```bash