sanjay920 commited on
Commit
7e65c15
1 Parent(s): 50003d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -5
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: mit
 
 
3
  model-index:
4
  - name: Rubra-Phi-3-mini-128k-instruct
5
  results:
@@ -10,7 +12,7 @@ model-index:
10
  name: MMLU
11
  metrics:
12
  - type: 5-shot
13
- value: 66.66
14
  verified: false
15
  - task:
16
  type: text-generation
@@ -19,7 +21,7 @@ model-index:
19
  name: GPQA
20
  metrics:
21
  - type: 0-shot
22
- value: 29.24
23
  verified: false
24
  - task:
25
  type: text-generation
@@ -28,7 +30,7 @@ model-index:
28
  name: GSM-8K
29
  metrics:
30
  - type: 8-shot, CoT
31
- value: 74.09
32
  verified: false
33
  - task:
34
  type: text-generation
@@ -37,7 +39,7 @@ model-index:
37
  name: MATH
38
  metrics:
39
  - type: 4-shot, CoT
40
- value: 26.84
41
  verified: false
42
  - task:
43
  type: text-generation
@@ -46,7 +48,7 @@ model-index:
46
  name: MT-bench
47
  metrics:
48
  - type: GPT-4 as Judge
49
- value: 7.45
50
  verified: false
51
  tags:
52
  - function-calling
@@ -65,6 +67,15 @@ Original model: [rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra
65
  ## Model description
66
  The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
67
 
 
 
 
 
 
 
 
 
 
68
  ## Training Data
69
  The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.
70
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  model-index:
6
  - name: Rubra-Phi-3-mini-128k-instruct
7
  results:
 
12
  name: MMLU
13
  metrics:
14
  - type: 5-shot
15
+ value: 67.87
16
  verified: false
17
  - task:
18
  type: text-generation
 
21
  name: GPQA
22
  metrics:
23
  - type: 0-shot
24
+ value: 29.69
25
  verified: false
26
  - task:
27
  type: text-generation
 
30
  name: GSM-8K
31
  metrics:
32
  - type: 8-shot, CoT
33
+ value: 79.45
34
  verified: false
35
  - task:
36
  type: text-generation
 
39
  name: MATH
40
  metrics:
41
  - type: 4-shot, CoT
42
+ value: 30.80
43
  verified: false
44
  - task:
45
  type: text-generation
 
48
  name: MT-bench
49
  metrics:
50
  - type: GPT-4 as Judge
51
+ value: 8.21
52
  verified: false
53
  tags:
54
  - function-calling
 
67
  ## Model description
68
  The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
69
 
70
+ | Model | Function Calling | MMLU | GPQA | GSM-8K | MATH | MT-bench | Win | Loss | Tie | Win Rate | Loss Rate | Adjusted Win Rate |
71
+ |----------------------------------------------|------------------|-------|-------|--------|-------|----------|-----|------|-----|----------|-----------|-------------------|
72
+ | Phi-3 Mini 128k Instruct (June) | - | 69.36 | 27.01 | 83.7 | 32.92 | 8.02 | 21 | 72 | 67 | 0.13125 | 0.45000 | 0.340625 |
73
+ | Rubra Enhanced Phi-3 Mini 128k Instruct (June)| 70.00% | 67.87 | 29.69 | 79.45 | 30.80 | 8.21 | 72 | 21 | 67 | 0.45000 | 0.13125 | **0.659375** |
74
+ | Phi-3 Mini 128k Instruct (April) | - | 68.17 | 25.90 | 80.44 | 28.12 | 7.92 | 51 | 45 | 64 | 0.31875 | 0.28125 | 0.51875 |
75
+ | Rubra Enhanced Phi-3 Mini 128k Instruct (April)| 65.71% | 66.66 | 29.24 | 74.09 | 26.84 | 7.45 | 45 | 51 | 64 | 0.28125 | 0.31875 | 0.48125 |
76
+ * Commit `e2ecb24bd9dae689bb30dafcf13cbbc9dbddead5` is the last commit to have the April-based Phi-3 model. The latest in main is built off the June model
77
+
78
+
79
  ## Training Data
80
  The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.
81