CausalLM/35b-beta2ep · Hugging Face

Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch

For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting?

No loras, no quants, no tricks.

This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on.

And, merge them if you want!

CausalLM
/

35b-beta2ep

Datasets used to train CausalLM/35b-beta2ep

Collection including CausalLM/35b-beta2ep

34B & 35B