InferenceIllusionist
commited on
Commit
•
ef2b6fe
1
Parent(s):
be66c03
Update README.md
Browse filesFirst draft of model card.
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
tags:
|
4 |
+
- conversational
|
5 |
+
- mixtral
|
6 |
+
- merge
|
7 |
+
- mergekit
|
8 |
+
---
|
9 |
+
|
10 |
+
<img src="https://files.catbox.moe/zdxyzv.png" width="400"/>
|
11 |
+
|
12 |
+
## TeTO-MS-8x7b
|
13 |
+
|
14 |
+
<b>Te</b>soro + <b>T</b>yphon + <b>O</b>penGPT
|
15 |
+
|
16 |
+
Presenting a Model Stock experiment combining the unique strengths from the following 8x7b Mixtral models:
|
17 |
+
* Tess-2.0-Mixtral-8x7B-v0.2 / [migtissera](https://huggingface.co/migtissera) / General Purpose
|
18 |
+
* Typhon-Mixtral-v1 / [Sao10K](https://huggingface.co/Sao10K) / Creative & Story Completion
|
19 |
+
* Open_Gpt4_8x7B_v0.2 / [rombodawg](https://huggingface.co/rombodawg) / Conversational
|
20 |
+
|
21 |
+
<H2>Methodology</H2>
|
22 |
+
|
23 |
+
> [I]nnovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model
|
24 |
+
<i> (From [arXiv:2403.19522](https://arxiv.org/pdf/2403.19522))</i>
|
25 |
+
|
26 |
+
|
27 |
+
* Methodology and merging process was based on the following paper - [Model Stock: All we need is just a few fine-tuned models](https://arxiv.org/abs/2403.19522)
|
28 |
+
* Initial model selection was based on top performing models of Mixtral architecture covering a variety of use cases and skills
|
29 |
+
* Base model (Mixtral Instruct 8x7b v0.1) was chosen after outperforming two other potential base models in terms of MMLU benchmark performance.
|
30 |
+
|
31 |
+
# Output
|
32 |
+
|
33 |
+
<img src="https://files.catbox.moe/bw97yg.PNG" width="400"/>
|
34 |
+
|
35 |
+
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
36 |
+
|
37 |
+
## Merge Details
|
38 |
+
### Merge Method
|
39 |
+
|
40 |
+
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using models/Mixtral-8x7B-v0.1-Instruct as a base.
|
41 |
+
|
42 |
+
### Models Merged
|
43 |
+
|
44 |
+
The following models were included in the merge:
|
45 |
+
* models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
|
46 |
+
* models/rombodawg_Open_Gpt4_8x7B_v0.2
|
47 |
+
* models/Sao10K_Typhon-Mixtral-v1
|
48 |
+
|
49 |
+
### Configuration
|
50 |
+
|
51 |
+
The following YAML configuration was used to produce this model:
|
52 |
+
|
53 |
+
```yaml
|
54 |
+
models:
|
55 |
+
- model: models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
|
56 |
+
- model: models/Sao10K_Typhon-Mixtral-v1
|
57 |
+
- model: models/rombodawg_Open_Gpt4_8x7B_v0.2
|
58 |
+
merge_method: model_stock
|
59 |
+
base_model: models/Mixtral-8x7B-v0.1-Instruct
|
60 |
+
dtype: float16
|
61 |
+
```
|