InferenceIllusionist
/

TeTO-MS-8x7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Jun 11

Commit

ef2b6fe

•

1 Parent(s): be66c03

Update README.md

First draft of model card.

Files changed (1) hide show

README.md +61 -3

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+tags:
+- conversational
+- mixtral
+- merge
+- mergekit
+---
+<img src="https://files.catbox.moe/zdxyzv.png" width="400"/>
+## TeTO-MS-8x7b
+<b>Te</b>soro  + <b>T</b>yphon + <b>O</b>penGPT
+Presenting a Model Stock experiment combining the unique strengths from the following 8x7b Mixtral models:
+* Tess-2.0-Mixtral-8x7B-v0.2 / [migtissera](https://huggingface.co/migtissera) / General Purpose
+* Typhon-Mixtral-v1 / [Sao10K](https://huggingface.co/Sao10K) / Creative & Story Completion
+* Open_Gpt4_8x7B_v0.2 / [rombodawg](https://huggingface.co/rombodawg) / Conversational
+<H2>Methodology</H2>
+> [I]nnovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model
+<i> (From [arXiv:2403.19522](https://arxiv.org/pdf/2403.19522))</i>
+* Methodology and merging process was based on the following paper - [Model Stock: All we need is just a few fine-tuned models](https://arxiv.org/abs/2403.19522)
+* Initial model selection was based on top performing models of Mixtral architecture covering a variety of use cases and skills
+* Base model (Mixtral Instruct 8x7b v0.1) was chosen after outperforming two other potential base models in terms of MMLU benchmark performance.
+# Output
+<img src="https://files.catbox.moe/bw97yg.PNG" width="400"/>
+This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
+## Merge Details
+### Merge Method
+This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using models/Mixtral-8x7B-v0.1-Instruct as a base.
+### Models Merged
+The following models were included in the merge:
+* models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
+* models/rombodawg_Open_Gpt4_8x7B_v0.2
+* models/Sao10K_Typhon-Mixtral-v1
+### Configuration
+The following YAML configuration was used to produce this model:
+```yaml
+models:
+  - model: models/migtissera_Tess-2.0-Mixtral-8x7B-v0.2
+  - model: models/Sao10K_Typhon-Mixtral-v1
+  - model: models/rombodawg_Open_Gpt4_8x7B_v0.2
+merge_method: model_stock
+base_model: models/Mixtral-8x7B-v0.1-Instruct
+dtype: float16
+```