rubra-ai
/

Phi-3-mini-128k-instruct-GGUF

@@ -1,5 +1,7 @@
 ---
 license: mit
 model-index:
 - name: Rubra-Phi-3-mini-128k-instruct
   results:
@@ -10,7 +12,7 @@ model-index:
       name: MMLU
     metrics:
     - type: 5-shot
-      value: 66.66
       verified: false
   - task:
       type: text-generation
@@ -19,7 +21,7 @@ model-index:
       name: GPQA
     metrics:
     - type: 0-shot
-      value: 29.24
       verified: false
   - task:
       type: text-generation
@@ -28,7 +30,7 @@ model-index:
       name: GSM-8K
     metrics:
     - type: 8-shot, CoT
-      value: 74.09
       verified: false
   - task:
       type: text-generation
@@ -37,7 +39,7 @@ model-index:
       name: MATH
     metrics:
     - type: 4-shot, CoT
-      value: 26.84
       verified: false
   - task:
       type: text-generation
@@ -46,7 +48,7 @@ model-index:
       name: MT-bench
     metrics:
     - type: GPT-4 as Judge
-      value: 7.45
       verified: false
 tags:
 - function-calling
@@ -65,6 +67,15 @@ Original model: [rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra
 ## Model description
 The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
 ## Training Data
 The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.

 ---
 license: mit
+library_name: transformers
+pipeline_tag: text-generation
 model-index:
 - name: Rubra-Phi-3-mini-128k-instruct
   results:
       name: MMLU
     metrics:
     - type: 5-shot
+      value: 67.87
       verified: false
   - task:
       type: text-generation
       name: GPQA
     metrics:
     - type: 0-shot
+      value: 29.69
       verified: false
   - task:
       type: text-generation
       name: GSM-8K
     metrics:
     - type: 8-shot, CoT
+      value: 79.45
       verified: false
   - task:
       type: text-generation
       name: MATH
     metrics:
     - type: 4-shot, CoT
+      value: 30.80
       verified: false
   - task:
       type: text-generation
       name: MT-bench
     metrics:
     - type: GPT-4 as Judge
+      value: 8.21
       verified: false
 tags:
 - function-calling
 ## Model description
 The model is the result of further post-training [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.
+| Model                                        | Function Calling | MMLU  | GPQA  | GSM-8K | MATH  | MT-bench | Win | Loss | Tie | Win Rate | Loss Rate | Adjusted Win Rate |
+|----------------------------------------------|------------------|-------|-------|--------|-------|----------|-----|------|-----|----------|-----------|-------------------|
+| Phi-3 Mini 128k Instruct (June)              | -                | 69.36 | 27.01 | 83.7   | 32.92 | 8.02     | 21  | 72   | 67  | 0.13125  | 0.45000   | 0.340625          |
+| Rubra Enhanced Phi-3 Mini 128k Instruct (June)| 70.00%               | 67.87 | 29.69 | 79.45  | 30.80 | 8.21     | 72  | 21   | 67  | 0.45000  | 0.13125   | **0.659375**      |
+| Phi-3 Mini 128k Instruct (April)             | -                | 68.17 | 25.90 | 80.44  | 28.12 | 7.92     | 51  | 45   | 64  | 0.31875  | 0.28125   | 0.51875           |
+| Rubra Enhanced Phi-3 Mini 128k Instruct (April)| 65.71%           | 66.66 | 29.24 | 74.09  | 26.84 | 7.45     | 45  | 51   | 64  | 0.28125  | 0.31875   | 0.48125         |
+* Commit `e2ecb24bd9dae689bb30dafcf13cbbc9dbddead5` is the last commit to have the April-based Phi-3 model. The latest in main is built off the June model
 ## Training Data
 The model underwent additional training on a proprietary dataset encompassing diverse instruction-following, chat, and function calling data. This post-training process enhances the model's ability to integrate tools and manage complex interaction scenarios effectively.