Commit
c7f5560
1 Parent(s): a5390a4

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (6c294787208832c22d52b45b0604133686731858)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +142 -5
README.md CHANGED
@@ -1,4 +1,8 @@
1
  ---
 
 
 
 
2
  library_name: peft
3
  tags:
4
  - Phi-2B
@@ -6,16 +10,133 @@ tags:
6
  - Bode
7
  - LLM
8
  - Alpaca
9
- license: mit
10
- language:
11
- - pt
12
- - en
13
  metrics:
14
  - accuracy
15
  - f1
16
  - precision
17
  - recall
18
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
  # Phi-Bode
@@ -110,4 +231,20 @@ Se você deseja utilizar o Phi-Bode em sua pesquisa, cite-o da seguinte maneira:
110
  doi = { 10.57967/hf/1880 },
111
  publisher = { Hugging Face }
112
  }
113
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ - en
5
+ license: mit
6
  library_name: peft
7
  tags:
8
  - Phi-2B
 
10
  - Bode
11
  - LLM
12
  - Alpaca
 
 
 
 
13
  metrics:
14
  - accuracy
15
  - f1
16
  - precision
17
  - recall
18
  pipeline_tag: text-generation
19
+ model-index:
20
+ - name: Phi-Bode
21
+ results:
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: ENEM Challenge (No Images)
27
+ type: eduagarcia/enem_challenge
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 33.94
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BLUEX (No Images)
43
+ type: eduagarcia-temp/BLUEX_without_images
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 25.31
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: OAB Exams
59
+ type: eduagarcia/oab_exams
60
+ split: train
61
+ args:
62
+ num_few_shot: 3
63
+ metrics:
64
+ - type: acc
65
+ value: 28.56
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 RTE
75
+ type: assin2
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: f1_macro
81
+ value: 68.1
82
+ name: f1-macro
83
+ - type: pearson
84
+ value: 30.57
85
+ name: pearson
86
+ source:
87
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
88
+ name: Open Portuguese LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: FaQuAD NLI
94
+ type: ruanchaves/faquad-nli
95
+ split: test
96
+ args:
97
+ num_few_shot: 15
98
+ metrics:
99
+ - type: f1_macro
100
+ value: 43.97
101
+ name: f1-macro
102
+ source:
103
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
104
+ name: Open Portuguese LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: HateBR Binary
110
+ type: eduagarcia/portuguese_benchmark
111
+ split: test
112
+ args:
113
+ num_few_shot: 25
114
+ metrics:
115
+ - type: f1_macro
116
+ value: 60.51
117
+ name: f1-macro
118
+ - type: f1_macro
119
+ value: 54.6
120
+ name: f1-macro
121
+ source:
122
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
123
+ name: Open Portuguese LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: tweetSentBR
129
+ type: eduagarcia-temp/tweetsentbr
130
+ split: test
131
+ args:
132
+ num_few_shot: 25
133
+ metrics:
134
+ - type: f1_macro
135
+ value: 46.78
136
+ name: f1-macro
137
+ source:
138
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
139
+ name: Open Portuguese LLM Leaderboard
140
  ---
141
 
142
  # Phi-Bode
 
231
  doi = { 10.57967/hf/1880 },
232
  publisher = { Hugging Face }
233
  }
234
+ ```
235
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
236
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/Phi-Bode)
237
+
238
+ | Metric | Value |
239
+ |--------------------------|---------|
240
+ |Average |**43.59**|
241
+ |ENEM Challenge (No Images)| 33.94|
242
+ |BLUEX (No Images) | 25.31|
243
+ |OAB Exams | 28.56|
244
+ |Assin2 RTE | 68.10|
245
+ |Assin2 STS | 30.57|
246
+ |FaQuAD NLI | 43.97|
247
+ |HateBR Binary | 60.51|
248
+ |PT Hate Speech Binary | 54.60|
249
+ |tweetSentBR | 46.78|
250
+