Update README.md
Browse files
README.md
CHANGED
@@ -2,15 +2,14 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
|
6 |
-
|
7 |
-
|
8 |
<div align="center">
|
9 |
<div> </div>
|
10 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/ieHnwuczidNNoGRA_FN2y.png" width="500"/>
|
11 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/UOsk9_zcbHpCCy6kmryYM.png" width="530"/>
|
12 |
</div>
|
13 |
|
|
|
|
|
14 |
## Key Messages
|
15 |
|
16 |
1. JetMoE-8B is **trained with less than $ 0.1 million**<sup>1</sup> **cost but outperforms LLaMA2-7B from Meta AI**, who has multi-billion-dollar training resources. LLM training can be **much cheaper than people generally thought**.
|
@@ -63,7 +62,7 @@ We use the same evaluation methodology as in the Open LLM leaderboard. For MBPP
|
|
63 |
To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.
|
64 |
|
65 |
## Model Usage
|
66 |
-
To load the models, you need install this package:
|
67 |
```
|
68 |
pip install -e .
|
69 |
```
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
|
|
|
|
|
|
5 |
<div align="center">
|
6 |
<div> </div>
|
7 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/ieHnwuczidNNoGRA_FN2y.png" width="500"/>
|
8 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/UOsk9_zcbHpCCy6kmryYM.png" width="530"/>
|
9 |
</div>
|
10 |
|
11 |
+
# JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
|
12 |
+
|
13 |
## Key Messages
|
14 |
|
15 |
1. JetMoE-8B is **trained with less than $ 0.1 million**<sup>1</sup> **cost but outperforms LLaMA2-7B from Meta AI**, who has multi-billion-dollar training resources. LLM training can be **much cheaper than people generally thought**.
|
|
|
62 |
To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.
|
63 |
|
64 |
## Model Usage
|
65 |
+
To load the models, you need install [this package](https://github.com/myshell-ai/JetMoE):
|
66 |
```
|
67 |
pip install -e .
|
68 |
```
|