InferenceIllusionist commited on
Commit
ef2b6fe
1 Parent(s): be66c03

Update README.md

Browse files

First draft of model card.

Files changed (1) hide show
  1. README.md +61 -3
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - conversational
5
+ - mixtral
6
+ - merge
7
+ - mergekit
8
+ ---
9
+
10
+ <img src="https://files.catbox.moe/zdxyzv.png" width="400"/>
11
+
12
+ ## TeTO-MS-8x7b
13
+
14
+ <b>Te</b>soro + <b>T</b>yphon + <b>O</b>penGPT
15
+
16
+ Presenting a Model Stock experiment combining the unique strengths from the following 8x7b Mixtral models:
17
+ * Tess-2.0-Mixtral-8x7B-v0.2 / [migtissera](https://huggingface.co/migtissera) / General Purpose
18
+ * Typhon-Mixtral-v1 / [Sao10K](https://huggingface.co/Sao10K) / Creative & Story Completion
19
+ * Open_Gpt4_8x7B_v0.2 / [rombodawg](https://huggingface.co/rombodawg) / Conversational
20
+
21
+ <H2>Methodology</H2>
22
+
23
+ > [I]nnovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model
24
+ <i> (From [arXiv:2403.19522](https://arxiv.org/pdf/2403.19522))</i>
25
+
26
+
27
+ * Methodology and merging process was based on the following paper - [Model Stock: All we need is just a few fine-tuned models](https://arxiv.org/abs/2403.19522)
28
+ * Initial model selection was based on top performing models of Mixtral architecture covering a variety of use cases and skills
29
+ * Base model (Mixtral Instruct 8x7b v0.1) was chosen after outperforming two other potential base models in terms of MMLU benchmark performance.
30
+
31
+ # Output
32
+
33
+ <img src="https://files.catbox.moe/bw97yg.PNG" width="400"/>
34
+
35
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
36
+
37
+ ## Merge Details
38
+ ### Merge Method
39
+
40
+ This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using models/Mixtral-8x7B-v0.1-Instruct as a base.
41
+
42
+ ### Models Merged
43
+
44
+ The following models were included in the merge:
45
+ * models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
46
+ * models/rombodawg_Open_Gpt4_8x7B_v0.2
47
+ * models/Sao10K_Typhon-Mixtral-v1
48
+
49
+ ### Configuration
50
+
51
+ The following YAML configuration was used to produce this model:
52
+
53
+ ```yaml
54
+ models:
55
+ - model: models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
56
+ - model: models/Sao10K_Typhon-Mixtral-v1
57
+ - model: models/rombodawg_Open_Gpt4_8x7B_v0.2
58
+ merge_method: model_stock
59
+ base_model: models/Mixtral-8x7B-v0.1-Instruct
60
+ dtype: float16
61
+ ```