Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
model-index:
|
4 |
- name: Rubra-Phi-3-mini-128k-instruct
|
5 |
results:
|
@@ -10,7 +12,7 @@ model-index:
|
|
10 |
name: MMLU
|
11 |
metrics:
|
12 |
- type: 5-shot
|
13 |
-
value:
|
14 |
verified: false
|
15 |
- task:
|
16 |
type: text-generation
|
@@ -19,7 +21,7 @@ model-index:
|
|
19 |
name: GPQA
|
20 |
metrics:
|
21 |
- type: 0-shot
|
22 |
-
value: 29.
|
23 |
verified: false
|
24 |
- task:
|
25 |
type: text-generation
|
@@ -28,7 +30,7 @@ model-index:
|
|
28 |
name: GSM-8K
|
29 |
metrics:
|
30 |
- type: 8-shot, CoT
|
31 |
-
value:
|
32 |
verified: false
|
33 |
- task:
|
34 |
type: text-generation
|
@@ -37,7 +39,7 @@ model-index:
|
|
37 |
name: MATH
|
38 |
metrics:
|
39 |
- type: 4-shot, CoT
|
40 |
-
value:
|
41 |
verified: false
|
42 |
- task:
|
43 |
type: text-generation
|
@@ -46,7 +48,7 @@ model-index:
|
|
46 |
name: MT-bench
|
47 |
metrics:
|
48 |
- type: GPT-4 as Judge
|
49 |
-
value:
|
50 |
verified: false
|
51 |
tags:
|
52 |
- function-calling
|
@@ -65,6 +67,15 @@ Original model: [rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra
|
|
65 |
## Model description
|
66 |
The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
## Training Data
|
69 |
The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.
|
70 |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
model-index:
|
6 |
- name: Rubra-Phi-3-mini-128k-instruct
|
7 |
results:
|
|
|
12 |
name: MMLU
|
13 |
metrics:
|
14 |
- type: 5-shot
|
15 |
+
value: 67.87
|
16 |
verified: false
|
17 |
- task:
|
18 |
type: text-generation
|
|
|
21 |
name: GPQA
|
22 |
metrics:
|
23 |
- type: 0-shot
|
24 |
+
value: 29.69
|
25 |
verified: false
|
26 |
- task:
|
27 |
type: text-generation
|
|
|
30 |
name: GSM-8K
|
31 |
metrics:
|
32 |
- type: 8-shot, CoT
|
33 |
+
value: 79.45
|
34 |
verified: false
|
35 |
- task:
|
36 |
type: text-generation
|
|
|
39 |
name: MATH
|
40 |
metrics:
|
41 |
- type: 4-shot, CoT
|
42 |
+
value: 30.80
|
43 |
verified: false
|
44 |
- task:
|
45 |
type: text-generation
|
|
|
48 |
name: MT-bench
|
49 |
metrics:
|
50 |
- type: GPT-4 as Judge
|
51 |
+
value: 8.21
|
52 |
verified: false
|
53 |
tags:
|
54 |
- function-calling
|
|
|
67 |
## Model description
|
68 |
The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
|
69 |
|
70 |
+
| Model | Function Calling | MMLU | GPQA | GSM-8K | MATH | MT-bench | Win | Loss | Tie | Win Rate | Loss Rate | Adjusted Win Rate |
|
71 |
+
|----------------------------------------------|------------------|-------|-------|--------|-------|----------|-----|------|-----|----------|-----------|-------------------|
|
72 |
+
| Phi-3 Mini 128k Instruct (June) | - | 69.36 | 27.01 | 83.7 | 32.92 | 8.02 | 21 | 72 | 67 | 0.13125 | 0.45000 | 0.340625 |
|
73 |
+
| Rubra Enhanced Phi-3 Mini 128k Instruct (June)| 70.00% | 67.87 | 29.69 | 79.45 | 30.80 | 8.21 | 72 | 21 | 67 | 0.45000 | 0.13125 | **0.659375** |
|
74 |
+
| Phi-3 Mini 128k Instruct (April) | - | 68.17 | 25.90 | 80.44 | 28.12 | 7.92 | 51 | 45 | 64 | 0.31875 | 0.28125 | 0.51875 |
|
75 |
+
| Rubra Enhanced Phi-3 Mini 128k Instruct (April)| 65.71% | 66.66 | 29.24 | 74.09 | 26.84 | 7.45 | 45 | 51 | 64 | 0.28125 | 0.31875 | 0.48125 |
|
76 |
+
* Commit `e2ecb24bd9dae689bb30dafcf13cbbc9dbddead5` is the last commit to have the April-based Phi-3 model. The latest in main is built off the June model
|
77 |
+
|
78 |
+
|
79 |
## Training Data
|
80 |
The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.
|
81 |
|