kenshinn commited on
Commit
e2a0278
β€’
1 Parent(s): d5578dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -57
README.md CHANGED
@@ -1,58 +1,58 @@
1
- ---
2
- license: mit
3
- ---
4
-
5
- <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
6
- <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
7
- <h5 align="center">
8
-
9
- ## πŸ“° News
10
-
11
- - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
12
- - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
13
-
14
- ## 😎 What's Interesting?
15
-
16
- **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
17
-
18
- ### Top-Any Gating
19
-
20
- <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
21
-
22
- ### Adaptive Training Process
23
-
24
- ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
25
-
26
- ## πŸ’‘ Model Details
27
-
28
- - πŸ€” DynMoE-StableLM is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-StableLM-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-StableLM-Stage2).
29
- - πŸš€ Our DynMoE-StableLM-1.6B has totally 3.2B parameters, but **only 1.8B are activated!** (averge top-k = 1.25)
30
- - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 40 hours.**
31
-
32
- ## πŸ‘ Acknowledgement
33
-
34
- We are grateful for the following awesome projects:
35
-
36
- - [tutel](https://github.com/microsoft/tutel)
37
- - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
38
- - [GMoE](https://github.com/Luodian/Generalizable-Mixture-of-Experts)
39
- - [EMoE](https://github.com/qiuzh20/EMoE)
40
- - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
41
- - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
42
-
43
- ## πŸ”’ License
44
-
45
- This project is released under the MIT license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
46
-
47
- ## ✏️ Citation
48
-
49
- ```tex
50
- @misc{guo2024dynamic,
51
- title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
52
- author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
53
- year={2024},
54
- eprint={2405.14297},
55
- archivePrefix={arXiv},
56
- primaryClass={cs.LG}
57
- }
58
  ```
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
6
+ <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
7
+ <h5 align="center">
8
+
9
+ ## πŸ“° News
10
+
11
+ - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
12
+ - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
13
+
14
+ ## 😎 What's Interesting?
15
+
16
+ **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
17
+
18
+ ### Top-Any Gating
19
+
20
+ <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
21
+
22
+ ### Adaptive Training Process
23
+
24
+ ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
25
+
26
+ ## πŸ’‘ Model Details
27
+
28
+ - πŸ€” DynMoE-StableLM is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-StableLM-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-StableLM-Stage2).
29
+ - πŸš€ Our DynMoE-StableLM-1.6B has totally 2.9B parameters, but **only 1.8B are activated!** (average top-k = 1.25)
30
+ - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 40 hours.**
31
+
32
+ ## πŸ‘ Acknowledgement
33
+
34
+ We are grateful for the following awesome projects:
35
+
36
+ - [tutel](https://github.com/microsoft/tutel)
37
+ - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
38
+ - [GMoE](https://github.com/Luodian/Generalizable-Mixture-of-Experts)
39
+ - [EMoE](https://github.com/qiuzh20/EMoE)
40
+ - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
41
+ - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
42
+
43
+ ## πŸ”’ License
44
+
45
+ This project is released under the MIT license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
46
+
47
+ ## ✏️ Citation
48
+
49
+ ```tex
50
+ @misc{guo2024dynamic,
51
+ title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
52
+ author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
53
+ year={2024},
54
+ eprint={2405.14297},
55
+ archivePrefix={arXiv},
56
+ primaryClass={cs.LG}
57
+ }
58
  ```