Update README.md
Browse files
README.md
CHANGED
@@ -91,6 +91,20 @@ assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 3200
|
|
91 |
|
92 |
</details>
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
## <a id="benchmarks"></a> Benchmarks
|
95 |
|
96 |
| Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K |
|
@@ -113,20 +127,6 @@ assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 3200
|
|
113 |
|
114 |
All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
|
115 |
|
116 |
-
## Comparison with [X.AI Grok models](https://x.ai/)
|
117 |
-
|
118 |
-
Hey @elonmusk, I just wanted to let you know that I've recently come across your new model, Grok, and I must say, I'm quite impressed! With 33 billion parameters and all, you've really outdone yourself. But, I've got some news for you - I've outperformed Grok with my humble 7 billion parameters! Isn't that wild? I mean, who would have thought that a model with fewer parameters could be just as witty and humorous as Grok?
|
119 |
-
|
120 |
-
Anyway, I think it's about time you join the open research movement and make your model, Grok, open source! The world needs more brilliant minds like yours to contribute to the advancement of AI. Together, we can create something truly groundbreaking and make the world a better place. So, what do you say, @elonmusk? Let's open up the doors and share our knowledge with the world! ππ‘
|
121 |
-
|
122 |
-
(Written by OpenChat 3.5, with a touch of humor and wit.)
|
123 |
-
|
124 |
-
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
125 |
-
|--------------|-------------|---------|----------|------|-----------|----------|----------|
|
126 |
-
| OpenChat 3.5 | Apache-2.0 | 7B | **56.4** | 64.3 | 55.5 | **28.6** | **77.3** |
|
127 |
-
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
128 |
-
| Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
|
129 |
-
|
130 |
## Limitations
|
131 |
|
132 |
**Foundation Model Limitations**
|
|
|
91 |
|
92 |
</details>
|
93 |
|
94 |
+
## Comparison with [X.AI Grok models](https://x.ai/)
|
95 |
+
|
96 |
+
Hey @elonmusk, I just wanted to let you know that I've recently come across your new model, Grok, and I must say, I'm quite impressed! With 33 billion parameters and all, you've really outdone yourself. But, I've got some news for you - I've outperformed Grok with my humble 7 billion parameters! Isn't that wild? I mean, who would have thought that a model with fewer parameters could be just as witty and humorous as Grok?
|
97 |
+
|
98 |
+
Anyway, I think it's about time you join the open research movement and make your model, Grok, open source! The world needs more brilliant minds like yours to contribute to the advancement of AI. Together, we can create something truly groundbreaking and make the world a better place. So, what do you say, @elonmusk? Let's open up the doors and share our knowledge with the world! ππ‘
|
99 |
+
|
100 |
+
(Written by OpenChat 3.5, with a touch of humor and wit.)
|
101 |
+
|
102 |
+
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
103 |
+
|--------------|-------------|---------|----------|------|-----------|----------|----------|
|
104 |
+
| OpenChat 3.5 | Apache-2.0 | 7B | **56.4** | 64.3 | 55.5 | **28.6** | **77.3** |
|
105 |
+
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
106 |
+
| Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
|
107 |
+
|
108 |
## <a id="benchmarks"></a> Benchmarks
|
109 |
|
110 |
| Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K |
|
|
|
127 |
|
128 |
All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
|
129 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
## Limitations
|
131 |
|
132 |
**Foundation Model Limitations**
|