leaderboard-pr-bot commited on
Commit
c530d8f
1 Parent(s): 92c6619

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +181 -63
README.md CHANGED
@@ -1,79 +1,184 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - medical
5
  - trl
6
  - trainer
7
- license: apache-2.0
8
- thumbnail: https://huggingface.co/ShieldX/manovyadh-1.1B-v1-chat/blob/main/manovyadh.png
9
  datasets:
10
  - ShieldX/manovyadh-3.5k
11
- language:
12
- - en
13
  metrics:
14
  - accuracy
 
15
  pipeline_tag: text-generation
16
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
17
  widget:
18
- - text: >
19
- ###SYSTEM: You are an AI assistant that helps people cope with stress and improve their mental health. User will tell you about their feelings and challenges. Your task is to listen empathetically and offer helpful suggestions. While responding, think about the user’s needs and goals and show compassion and support
20
-
21
-
22
- ###USER: I don't know how to tell someone how I feel about them. How can I get better at expressing how I feel??
23
-
24
-
25
- ###ASSISTANT:
 
 
 
 
 
26
  model-index:
27
- - name: manovyadh-1.1B-v1-chat
28
- results:
29
- - task:
30
- type: text-generation
31
- dataset:
32
- name: ai2_arc
33
- type: arc
34
- metrics:
35
- - name: pass@1
36
- type: pass@1
37
- value: 35.92
38
- source:
39
- name: Open LLM Leaderboard
40
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
41
- - task:
42
- type: text-generation
43
- dataset:
44
- name: hellaswag
45
- type: hellaswag
46
- metrics:
47
- - name: pass@1
48
- type: pass@1
49
- value: 60.03
50
- source:
51
- name: Open LLM Leaderboard
52
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
53
- - task:
54
- type: text-generation
55
- dataset:
56
- name: truthful_qa
57
- type: truthful_qa
58
- metrics:
59
- - name: pass@1
60
- type: pass@1
61
- value: 39.17
62
- source:
63
- name: Open LLM Leaderboard
64
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
65
- - task:
66
- type: text-generation
67
- dataset:
68
- name: winogrande
69
- type: winogrande
70
- metrics:
71
- - name: pass@1
72
- type: pass@1
73
- value: 61.09
74
- source:
75
- name: Open LLM Leaderboard
76
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ---
78
 
79
  # Uploaded model
@@ -346,4 +451,17 @@ ShieldX a.k.a Rohan Shaw
346
 
347
  # Model Card Contact
348
 
349
- email : [email protected]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - medical
8
  - trl
9
  - trainer
 
 
10
  datasets:
11
  - ShieldX/manovyadh-3.5k
 
 
12
  metrics:
13
  - accuracy
14
+ thumbnail: https://huggingface.co/ShieldX/manovyadh-1.1B-v1-chat/blob/main/manovyadh.png
15
  pipeline_tag: text-generation
16
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
17
  widget:
18
+ - text: '###SYSTEM: You are an AI assistant that helps people cope with stress and
19
+ improve their mental health. User will tell you about their feelings and challenges.
20
+ Your task is to listen empathetically and offer helpful suggestions. While responding,
21
+ think about the user’s needs and goals and show compassion and support
22
+
23
+
24
+ ###USER: I don''t know how to tell someone how I feel about them. How can I get
25
+ better at expressing how I feel??
26
+
27
+
28
+ ###ASSISTANT:
29
+
30
+ '
31
  model-index:
32
+ - name: manovyadh-1.1B-v1-chat
33
+ results:
34
+ - task:
35
+ type: text-generation
36
+ dataset:
37
+ name: ai2_arc
38
+ type: arc
39
+ metrics:
40
+ - type: pass@1
41
+ value: 35.92
42
+ name: pass@1
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ dataset:
49
+ name: hellaswag
50
+ type: hellaswag
51
+ metrics:
52
+ - type: pass@1
53
+ value: 60.03
54
+ name: pass@1
55
+ source:
56
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ dataset:
61
+ name: truthful_qa
62
+ type: truthful_qa
63
+ metrics:
64
+ - type: pass@1
65
+ value: 39.17
66
+ name: pass@1
67
+ source:
68
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ dataset:
73
+ name: winogrande
74
+ type: winogrande
75
+ metrics:
76
+ - type: pass@1
77
+ value: 61.09
78
+ name: pass@1
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: AI2 Reasoning Challenge (25-Shot)
87
+ type: ai2_arc
88
+ config: ARC-Challenge
89
+ split: test
90
+ args:
91
+ num_few_shot: 25
92
+ metrics:
93
+ - type: acc_norm
94
+ value: 35.92
95
+ name: normalized accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: HellaSwag (10-Shot)
104
+ type: hellaswag
105
+ split: validation
106
+ args:
107
+ num_few_shot: 10
108
+ metrics:
109
+ - type: acc_norm
110
+ value: 60.03
111
+ name: normalized accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
114
+ name: Open LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: MMLU (5-Shot)
120
+ type: cais/mmlu
121
+ config: all
122
+ split: test
123
+ args:
124
+ num_few_shot: 5
125
+ metrics:
126
+ - type: acc
127
+ value: 25.82
128
+ name: accuracy
129
+ source:
130
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: TruthfulQA (0-shot)
137
+ type: truthful_qa
138
+ config: multiple_choice
139
+ split: validation
140
+ args:
141
+ num_few_shot: 0
142
+ metrics:
143
+ - type: mc2
144
+ value: 39.17
145
+ source:
146
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
147
+ name: Open LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: Winogrande (5-shot)
153
+ type: winogrande
154
+ config: winogrande_xl
155
+ split: validation
156
+ args:
157
+ num_few_shot: 5
158
+ metrics:
159
+ - type: acc
160
+ value: 61.09
161
+ name: accuracy
162
+ source:
163
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
164
+ name: Open LLM Leaderboard
165
+ - task:
166
+ type: text-generation
167
+ name: Text Generation
168
+ dataset:
169
+ name: GSM8k (5-shot)
170
+ type: gsm8k
171
+ config: main
172
+ split: test
173
+ args:
174
+ num_few_shot: 5
175
+ metrics:
176
+ - type: acc
177
+ value: 1.74
178
+ name: accuracy
179
+ source:
180
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ShieldX/manovyadh-1.1B-v1-chat
181
+ name: Open LLM Leaderboard
182
  ---
183
 
184
  # Uploaded model
 
451
 
452
  # Model Card Contact
453
 
454
+ email : [email protected]
455
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
456
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ShieldX__manovyadh-1.1B-v1-chat)
457
+
458
+ | Metric |Value|
459
+ |---------------------------------|----:|
460
+ |Avg. |37.30|
461
+ |AI2 Reasoning Challenge (25-Shot)|35.92|
462
+ |HellaSwag (10-Shot) |60.03|
463
+ |MMLU (5-Shot) |25.82|
464
+ |TruthfulQA (0-shot) |39.17|
465
+ |Winogrande (5-shot) |61.09|
466
+ |GSM8k (5-shot) | 1.74|
467
+