Marcel Bischoff
commited on
Commit
•
b3c4e71
1
Parent(s):
7730b56
README
Browse files
README.md
CHANGED
@@ -17,10 +17,16 @@ tags:
|
|
17 |
|
18 |
![](https://i.imgur.com/UOb2fvh.jpg)
|
19 |
|
20 |
-
# phixtral-4x2_8
|
|
|
|
|
|
|
|
|
21 |
|
22 |
phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
|
23 |
|
|
|
|
|
24 |
## 🏆 Evaluation
|
25 |
|
26 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
@@ -109,4 +115,4 @@ A special thanks to [vince62s](https://huggingface.co/vince62s) for the inferenc
|
|
109 |
|
110 |
Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
|
111 |
|
112 |
-
Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.
|
|
|
17 |
|
18 |
![](https://i.imgur.com/UOb2fvh.jpg)
|
19 |
|
20 |
+
# phixtral-4x2_8-gates-poc
|
21 |
+
phixtral-4x2_8-gates-poc is [phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8)
|
22 |
+
with finetuned gates for better selection of Expert and to break the symmetry.
|
23 |
+
As a POC we only used 400 shorter samples
|
24 |
+
from [openhermes](https://huggingface.co/datasets/teknium/openhermes).
|
25 |
|
26 |
phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
|
27 |
|
28 |
+
|
29 |
+
|
30 |
## 🏆 Evaluation
|
31 |
|
32 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
|
|
115 |
|
116 |
Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
|
117 |
|
118 |
+
Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.
|