marcel
/

phixtral-4x2_8-gates-poc

Text Generation

Mixture of Experts

cognitivecomputations/dolphin-2_6-phi-2

lxuechen/phi-2-dpo

Yhyu13/phi-2-sft-dpo-gpt4_en-ep1

mrm8488/phi-2-coder

Model card Files Files and versions Community

Marcel Bischoff commited on Jan 12

Commit

b3c4e71

•

1 Parent(s): 7730b56

README

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -17,10 +17,16 @@ tags:
 ![](https://i.imgur.com/UOb2fvh.jpg)
-# phixtral-4x2_8
 phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
 ## 🏆 Evaluation
 |                             Model                              |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
@@ -109,4 +115,4 @@ A special thanks to [vince62s](https://huggingface.co/vince62s) for the inferenc
 Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
-Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.

 ![](https://i.imgur.com/UOb2fvh.jpg)
+# phixtral-4x2_8-gates-poc
+phixtral-4x2_8-gates-poc is [phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8)
+with finetuned gates for better selection of Expert and to break the symmetry.
+As a POC we only used 400 shorter samples
+from [openhermes](https://huggingface.co/datasets/teknium/openhermes).
 phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
 ## 🏆 Evaluation
 |                             Model                              |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
 Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
+Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.