4x1.8B MoE Qwen Ckpt 50000
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.
This model is a checkpoint model for the continue pretraining stage.
Evaluations
Groups | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|
boolq | 0 | acc | 0.6508 | ± | 0.0083 |
ceval-valid | 0 | acc | 0.5290 | ± | 0.1912 |
0 | acc_norm | 0.5290 | ± | 0.1912 | |
cmmlu | 0 | acc | 0.5087 | ± | 0.1237 |
0 | acc_norm | 0.5087 | ± | 0.1237 | |
mathqa | 0 | acc | 0.2647 | ± | 0.0081 |
0 | acc_norm | 0.2693 | ± | 0.0081 | |
mmlu | 0 | acc | 0.4353 | ± | 0.0830 |
- stem | 0 | acc | 0.3809 | ± | 0.0659 |
- social_sciences | 0 | acc | 0.4959 | ± | 0.0708 |
- other | 0 | acc | 0.4844 | ± | 0.0744 |
- humanities | 0 | acc | 0.3998 | ± | 0.0849 |
Acknowledgements
License Agreement
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.