Good!
This Lora works well! It makes Yi-34B-200K understand Alpaca syntax (even though that's not even the base model for this lora) and seems to keep it coherent.
Not sure how it works at extremely long context. We shall see.
This Lora works well! It makes Yi-34B-200K understand Alpaca syntax (even though that's not even the base model for this lora) and seems to keep it coherent.
Not sure how it works at extremely long context. We shall see.
Thanks, if you have some good datasets or advices to improve the chat, let me know
If you have time, maybe you can finetune it on this, or even just on a small part of this? https://huggingface.co/datasets/jondurbin/airoboros-2.2.1
Thank you very much
If you have time, maybe you can finetune it on this, or even just on a small part of this? https://huggingface.co/datasets/jondurbin/airoboros-2.2.1
Thank you very much
Thanks, I've seen that there are some new dataset, such as https://huggingface.co/datasets/jondurbin/airoboros-3.1 . And I can't recognize their difference. Will it be better than old ones?
If you have time, maybe you can finetune it on this, or even just on a small part of this? https://huggingface.co/datasets/jondurbin/airoboros-2.2.1
Thank you very muchThanks, I've seen that there are some new dataset, such as https://huggingface.co/datasets/jondurbin/airoboros-3.1 . And I can't recognize their difference. Will it be better than old ones?
That's a interesting question. In my opinion, I like 2.2.1 more, because it's in ShareGPT format and seems less buggy. Others prefer 3.1. It's up to you what you train on, but I would love to see Vicuna ShareGPT 2.2.1. But if you want to use 3.1, please use the no_mathjson version, because the mathjson version is known to make the model dumber.
Kind regards and thanks
This Lora works well! It makes Yi-34B-200K understand Alpaca syntax (even though that's not even the base model for this lora) and seems to keep it coherent.
Not sure how it works at extremely long context. We shall see.
Thanks, if you have some good datasets or advices to improve the chat, let me know
CoT is good!
CollectiveCognition seems to be very effective, and its so small that you could match the format to another dataset and throw it in.
https://huggingface.co/CollectiveCognition
Or maybe Ultrachat, if you just use the highest rated (all 5) responses: https://huggingface.co/datasets/openbmb/UltraFeedback/viewer/default/train?p=1&row=118
Really, the big thing would just be to use the Yi 200K model as a base model, and train it with a longish context.