euclaise/gpt-neox-122m-minipile-digits · Awesome weights, thank you!

So, finetuning from your model:

Step	Training Loss	Validation Loss
1000	1.191700	0.750407
2000	0.567600	0.580200
3000	0.363700	0.517140
4000	0.275200	0.503491
5000	0.244300	0.501706

Finetuning from EleutherAI/pythia-160m that is the same model in all layers but Layer: embed_in.weight | Shape: torch.Size([50304, 768]) vs embed_in.weight | Shape: torch.Size([48262, 768]) in yours gives validation progression from 1.5 to 1.01. I'm sad your model mod fails to run on Llama.cpp... Yet 1.1 vs 0.5 is so fun...: