Trelis/Yi-6B-200K-Llamafied-chat-SFT-AWQ

Jan 7

Have the Yi 6B/34B models been fine-tuned to use EOS properly up to the 200k context length?

Did you use a fine-tuning dataset with long enough example to achieve that?

How did you test the quality of the context and that the model properly using EOS up to 200k context lengths? Any results?

viktor-ferenczi

Jan 7

•

edited Jan 7

Did you use PEFT or full fine-tuning? Same for 6B and 34B?

RonanMcGovern

Trelis org Jan 7

Howdy!

Here is the video to check out: https://www.youtube.com/watch?v=71x8EMrB0Gc

I use PEFT training (bf16), but with the addition of making embed and norm modules trainable as well. This allows the model to get the chat format and get the EOS token correct. Alternatively, you could do full fine tuning, but typically that is less stable and much slower to get the same results.

As you will see in the video, the 6B model does not perform well responding with text after about 15 to 20,000 tokens. However, the larger 34B model does achieve good responses - even for 100K+ contexts. This is despite the fine tuning I did which involved only 4000 token context.

RonanMcGovern changed discussion status to closed Jan 9

Trelis
/

Yi-6B-200K-Llamafied-chat-SFT-AWQ

EOS fine-tuning