Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ Relevant Axolotl Configurations:
|
|
39 |
<br>\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.
|
40 |
<br>\- 2M Rope Theta had the best loss results during training compared to other values.
|
41 |
<br>\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.
|
42 |
-
<br>\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting.
|
43 |
<br>\- Pretraining / Noise made it worse at Haystack too? It wasn't all Green, Mainly Oranges.
|
44 |
<br>\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.
|
45 |
|
|
|
39 |
<br>\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.
|
40 |
<br>\- 2M Rope Theta had the best loss results during training compared to other values.
|
41 |
<br>\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.
|
42 |
+
<br>\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting.
|
43 |
<br>\- Pretraining / Noise made it worse at Haystack too? It wasn't all Green, Mainly Oranges.
|
44 |
<br>\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.
|
45 |
|