Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,7 @@ license: apache-2.0
|
|
24 |
- **Agent abilities** I did train this model on agent datasets, that teach it to do real world tasks such as picking up an object, and even navigating a webpage based off HTML.
|
25 |
- **Good Chili Recipe** The model gives a good chili recipe :)
|
26 |
- **32k Sequence Length** This model was trained with a 32k sequence length.
|
|
|
27 |
|
28 |
### Experimental Nature
|
29 |
Please note that Mistral-22b is still in a WIP. v0.3 has started training now, with a different method than used before, this is to hopefully make the model more round in its internel knowlledge. Through my testing I found V2 to be a significant improvement over v.1.
|
@@ -35,7 +36,9 @@ Please note that Mistral-22b is still in a WIP. v0.3 has started training now, w
|
|
35 |
### Stay Updated
|
36 |
**V.3**, coming soon! And is currently training, will be done in the next ~24 hours. 🌟Paper Coming Soon🌟
|
37 |
- There will be more of these 22b models. They 5-6 siblings till I find what the best results are for MOE compression.
|
38 |
-
- However I am very surprised at how good this V.2 model is, off my small testing.
|
|
|
|
|
39 |
|
40 |
### Usage:
|
41 |
- This model requires a specific chat template, as the training format was Guanaco this is what it looks like:
|
|
|
24 |
- **Agent abilities** I did train this model on agent datasets, that teach it to do real world tasks such as picking up an object, and even navigating a webpage based off HTML.
|
25 |
- **Good Chili Recipe** The model gives a good chili recipe :)
|
26 |
- **32k Sequence Length** This model was trained with a 32k sequence length.
|
27 |
+
- **GUANACO PROMPT FORMAT** YOU MUST USE THE GUANACO PROMPT FORMAT SHOWN BELLOW IN USAGE. Not using this prompt format will lead to sub optimal results.
|
28 |
|
29 |
### Experimental Nature
|
30 |
Please note that Mistral-22b is still in a WIP. v0.3 has started training now, with a different method than used before, this is to hopefully make the model more round in its internel knowlledge. Through my testing I found V2 to be a significant improvement over v.1.
|
|
|
36 |
### Stay Updated
|
37 |
**V.3**, coming soon! And is currently training, will be done in the next ~24 hours. 🌟Paper Coming Soon🌟
|
38 |
- There will be more of these 22b models. They 5-6 siblings till I find what the best results are for MOE compression.
|
39 |
+
- However I am very surprised at how good this V.2 model is, off my small testing.
|
40 |
+
- I will be releasing a blog post soon on how I did this, I still will release a paper with testing and results, but I'm gonna rush out a paper before hand so I can share how I did this. I'd just like to make sure the right people get the right credit for their work that I used, so I have to read up some and make sure everyone gets the credit they deserve, ( and I need quality sleep my entire sleep schedule has been abominated since mixtrals drop.) I appreciate your understanding.
|
41 |
+
- I have a bunch of other methods I have yet to try, and many of those methods required me making this model, and running the initial tests, so they are only going to get better from here, I appretiate feedback, thank you!
|
42 |
|
43 |
### Usage:
|
44 |
- This model requires a specific chat template, as the training format was Guanaco this is what it looks like:
|