34B & 35B
Collection
5 items
•
Updated
•
2
Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch
For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting?
No loras, no quants, no tricks.
This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on.
And, merge them if you want!