Delta-Vector
commited on
Commit
•
83d850a
1
Parent(s):
44e01ec
Update README.md
Browse files
README.md
CHANGED
@@ -215,8 +215,4 @@ The training was done for 2 epochs. We used 8 x [H100s](https://www.nvidia.com/
|
|
215 |
|
216 |
## Safety
|
217 |
|
218 |
-
|
219 |
-
|
220 |
-
## Musings
|
221 |
-
|
222 |
-
One of the members of Anthracite had quite an interesting idea, to finetune a smaller model for 4 epochs at a lower Learning rate as quote "Smaller models learn slower" - [Kalomaze]() provided access to 8 X A40s and We finetuned what now is [Darkens-8B]() for 4 epochs (and it's 2.5 Epoch version released as [Tor-8B]()) and the result was quite impressive, the 4 epoch model was not "overfit" at all and was rather pleasant to use. Lucy Knada then allowed me to do a full parameter finetune with the same configuration as Darkens/Tor-8B (With some minor dataset tweaks) on 8 * H100s, We hosted and tested the models and i ended up giving the green light to release the 4 epoch version at Magnum 9B V4 and released the 2 epoch version as my own. I felt both were extremely good models, but in testing i preferred the 2 epoch. It was not as "suggestive" as magnum models (and Claude RP log trained models) are. It would not dive into Claudeisms right out of the gate and you could use it for both Safe for work and "Other" purposes.
|
|
|
215 |
|
216 |
## Safety
|
217 |
|
218 |
+
Nein.
|
|
|
|
|
|
|
|