Update README.md
Browse files
README.md
CHANGED
@@ -69,12 +69,11 @@ parameters:
|
|
69 |
dtype: float16
|
70 |
|
71 |
# Then, DPO Finetune
|
|
|
72 |
|
73 |
```
|
74 |
|
75 |
-
*
|
76 |
-
|
77 |
-
*I used a higher learning rate and full dataset compared to "L3.1-Celestial-Stone-2x8B-DPO".*
|
78 |
|
79 |
# Prompt Template:
|
80 |
```bash
|
|
|
69 |
dtype: float16
|
70 |
|
71 |
# Then, DPO Finetune
|
72 |
+
# [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)
|
73 |
|
74 |
```
|
75 |
|
76 |
+
*I used a higher learning rate and full dataset when training compared to my "L3.1-Celestial-Stone-2x8B-DPO". This caused lower loss and better adaption to the chosen style.*
|
|
|
|
|
77 |
|
78 |
# Prompt Template:
|
79 |
```bash
|