What is this?
Is this a new instruction fine tuned model? If so could you provide some info on what it was trained on?
Thanks in advance
Your "contact us" should be higher up. Great work!
Wow yeah this looks really interesting. I will do quantisations of it now, so more people can run it and learn about it
Now that Llama 2 is out, are you planning to bring out a llama-2-13b-instruct, and/or maybe llama-2-70b-instruct? It's a shame there's no Llama 2 34B yet but apparently it's coming fairly soon.
By the way I suggest you put your full model card in all the variants. The 30B 2048 is definitely the most interesting I think, but it only has a very short model card where the user has to click elsewhere to learn what this is. I would copy the full model card to each model, with a brief line explaining what is different about each particular one. Less work for the user = more interest!
invading this discussion a bit, i would like to know if we will ever get a 65B 2048, after all it's clear that 30B 2048 got much better results than 30B 1024 so probably 65B would follow this trend.
@TheBloke Thank you for your interest in our model. Taking into account the number of GPUs available to us, we're planning to fine-tune the Llama2 model. We'll soon release the Llama2-70b model which has been trained with 200k data. We appreciate your valuable suggestions. :)
@nxnhjrjtbjfzhrovwl Given that the Llama2-70b model is better than the 65b, we're planning to fine-tune the Llama2-70b-2048 model first.
Great to hear!
Ideally you would do Llama2-70B-4096? Given it has increased context.