leaderboard-pt-pr-bot commited on
Commit
63f01ae
•
1 Parent(s): bc3ab00

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +166 -0
README.md CHANGED
@@ -1,6 +1,153 @@
1
  ---
2
  library_name: peft
3
  base_model: recogna-nlp/internlm-chatbode-7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -200,3 +347,22 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
200
  ### Framework versions
201
 
202
  - PEFT 0.10.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
  base_model: recogna-nlp/internlm-chatbode-7b
4
+ model-index:
5
+ - name: internlm-chatbode2-7b-alpha
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: ENEM Challenge (No Images)
12
+ type: eduagarcia/enem_challenge
13
+ split: train
14
+ args:
15
+ num_few_shot: 3
16
+ metrics:
17
+ - type: acc
18
+ value: 57.94
19
+ name: accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
22
+ name: Open Portuguese LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BLUEX (No Images)
28
+ type: eduagarcia-temp/BLUEX_without_images
29
+ split: train
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc
34
+ value: 48.68
35
+ name: accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
38
+ name: Open Portuguese LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: OAB Exams
44
+ type: eduagarcia/oab_exams
45
+ split: train
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc
50
+ value: 38.22
51
+ name: accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
54
+ name: Open Portuguese LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: Assin2 RTE
60
+ type: assin2
61
+ split: test
62
+ args:
63
+ num_few_shot: 15
64
+ metrics:
65
+ - type: f1_macro
66
+ value: 91.55
67
+ name: f1-macro
68
+ source:
69
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
70
+ name: Open Portuguese LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: Assin2 STS
76
+ type: eduagarcia/portuguese_benchmark
77
+ split: test
78
+ args:
79
+ num_few_shot: 15
80
+ metrics:
81
+ - type: pearson
82
+ value: 81.51
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 79.09
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: ruanchaves/hatebr
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 85.85
115
+ name: f1-macro
116
+ source:
117
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
118
+ name: Open Portuguese LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: PT Hate Speech Binary
124
+ type: hate_speech_portuguese
125
+ split: test
126
+ args:
127
+ num_few_shot: 25
128
+ metrics:
129
+ - type: f1_macro
130
+ value: 71.79
131
+ name: f1-macro
132
+ source:
133
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
134
+ name: Open Portuguese LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: tweetSentBR
140
+ type: eduagarcia/tweetsentbr_fewshot
141
+ split: test
142
+ args:
143
+ num_few_shot: 25
144
+ metrics:
145
+ - type: f1_macro
146
+ value: 65.41
147
+ name: f1-macro
148
+ source:
149
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode2-7b-alpha
150
+ name: Open Portuguese LLM Leaderboard
151
  ---
152
 
153
  # Model Card for Model ID
 
347
  ### Framework versions
348
 
349
  - PEFT 0.10.0
350
+
351
+
352
+ # Open Portuguese LLM Leaderboard Evaluation Results
353
+
354
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/internlm-chatbode2-7b-alpha) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
355
+
356
+ | Metric | Value |
357
+ |--------------------------|---------|
358
+ |Average |**68.89**|
359
+ |ENEM Challenge (No Images)| 57.94|
360
+ |BLUEX (No Images) | 48.68|
361
+ |OAB Exams | 38.22|
362
+ |Assin2 RTE | 91.55|
363
+ |Assin2 STS | 81.51|
364
+ |FaQuAD NLI | 79.09|
365
+ |HateBR Binary | 85.85|
366
+ |PT Hate Speech Binary | 71.79|
367
+ |tweetSentBR | 65.41|
368
+