Tensoic
/

TinyLlama-1.1B-3T-openhermes

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adarshxs commited on Dec 29, 2023

Commit

bce5b0e

•

1 Parent(s): 465e398

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -96,6 +96,12 @@ special_tokens:
 The model achieves the following loss:
 - Loss: 1.3647
 ### Training hyperparameters
 The following hyperparameters were used during training:

 The model achieves the following loss:
 - Loss: 1.3647
+The loss exploded after a couple hundred steps. As suggested by [winglian](https://x.com/winglian/status/1740776666744700941?s=20), we set the following values in the config file:
+```
+adam_epsilon: 0.00001
+max_grad_norm: 1.0
+```
 ### Training hyperparameters
 The following hyperparameters were used during training: