Adding Evaluation Results

#1
by leaderboard-pr-bot - opened
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,10 +1,105 @@
1
  ---
2
  license: other
3
- license_name: nvidia-community-model-license
4
- license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/
5
  library_name: transformers
6
  base_model:
7
  - nvidia/Mistral-NeMo-Minitron-8B-Base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  # Mistral-NeMo-Minitron-8B-Instruct
@@ -117,4 +212,17 @@ The model was trained on data that contains toxic language and societal biases o
117
 
118
  ## Ethical Considerations
119
 
120
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the [Model Card++](https://build.nvidia.com/nvidia/mistral-nemo-minitron-8b-8k-instruct/modelcard). Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
 
3
  library_name: transformers
4
  base_model:
5
  - nvidia/Mistral-NeMo-Minitron-8B-Base
6
+ license_name: nvidia-community-model-license
7
+ license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/
8
+ model-index:
9
+ - name: Mistral-NeMo-Minitron-8B-Instruct
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: IFEval (0-Shot)
16
+ type: HuggingFaceH4/ifeval
17
+ args:
18
+ num_few_shot: 0
19
+ metrics:
20
+ - type: inst_level_strict_acc and prompt_level_strict_acc
21
+ value: 50.04
22
+ name: strict accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BBH (3-Shot)
31
+ type: BBH
32
+ args:
33
+ num_few_shot: 3
34
+ metrics:
35
+ - type: acc_norm
36
+ value: 34.13
37
+ name: normalized accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: MATH Lvl 5 (4-Shot)
46
+ type: hendrycks/competition_math
47
+ args:
48
+ num_few_shot: 4
49
+ metrics:
50
+ - type: exact_match
51
+ value: 0.45
52
+ name: exact match
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: GPQA (0-shot)
61
+ type: Idavidrein/gpqa
62
+ args:
63
+ num_few_shot: 0
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 5.03
67
+ name: acc_norm
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: MuSR (0-shot)
76
+ type: TAUR-Lab/MuSR
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: acc_norm
81
+ value: 7.37
82
+ name: acc_norm
83
+ source:
84
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: MMLU-PRO (5-shot)
91
+ type: TIGER-Lab/MMLU-Pro
92
+ config: main
93
+ split: test
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 33.23
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nvidia/Mistral-NeMo-Minitron-8B-Instruct
102
+ name: Open LLM Leaderboard
103
  ---
104
 
105
  # Mistral-NeMo-Minitron-8B-Instruct
 
212
 
213
  ## Ethical Considerations
214
 
215
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the [Model Card++](https://build.nvidia.com/nvidia/mistral-nemo-minitron-8b-8k-instruct/modelcard). Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
216
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
217
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_nvidia__Mistral-NeMo-Minitron-8B-Instruct)
218
+
219
+ | Metric |Value|
220
+ |-------------------|----:|
221
+ |Avg. |21.71|
222
+ |IFEval (0-Shot) |50.04|
223
+ |BBH (3-Shot) |34.13|
224
+ |MATH Lvl 5 (4-Shot)| 0.45|
225
+ |GPQA (0-shot) | 5.03|
226
+ |MuSR (0-shot) | 7.37|
227
+ |MMLU-PRO (5-shot) |33.23|
228
+