Did you just compare 7B 100K model to Claude2-100K?
https://github.com/lyogavin/Anima/tree/main/anima_100k
Since I only found 7B 100k models, I am assuming evaluation above was done between 7B models and Claude2. I bet Claude2 could be some over 200B MoE monster model to obey the scaling law of emergence. If 7B 100K model is that good, I cannot wait to see what you guys can achieve with bigger models.
If QLoRA a 7B models only requires 800MB, the fined-tuned 70B-100K model should on it's way. I guess?
https://github.com/lyogavin/Anima/tree/main/anima_100k
Since I only found 7B 100k models, I am assuming evaluation above was done between 7B models and Claude2. I bet Claude2 could be some over 200B MoE monster model to obey the scaling law of emergence. If 7B 100K model is that good, I cannot wait to see what you guys can achieve with bigger models.
If QLoRA a 7B models only requires 800MB, the fined-tuned 70B-100K model should on it's way. I guess?
Can't wait to check what level a fined-tuned QLoRA 70B-100K model can achieve.