nicolasdec commited on
Commit
d48f2bf
1 Parent(s): f1b31c5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - pt
4
+ - en
5
+ license: cc
6
+ tags:
7
+ - text-generation-inference
8
+ - transformers
9
+ - unsloth
10
+ - mistral
11
+ - gguf
12
+ - brazil
13
+ - brasil
14
+ - portuguese
15
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
16
+ pipeline_tag: text-generation
17
+ ---
18
+ # Cabra Mistral 7b v2
19
+ <img src="https://media.discordapp.net/attachments/1060891441724932096/1219303427000242316/blackpantera_cute_goat_with_red_M_in_the_background_brazil_flag_3b448f3a-d500-4f01-877f-2e469aba7dfc.png?ex=660acfce&is=65f85ace&hm=28ee401f092b558b11df54951270189641fe7d1173bfc4a5d633e53fb03c2d6d&=&format=webp&quality=lossless&width=350&height=350" width="400" height="400">
20
+
21
+ Esse modelo é um finetune do [Mistral 7b Instruct 0.2](https://huggingface.co/mistralai/mistral-7b-instruct-v0.2) com o dataset interno Cabra 5k. Esse modelo é optimizado para português e responde em portuguese.
22
+
23
+ **Exprimente o nosso demo aqui: [CabraChat](https://huggingface.co/spaces/nicolasdec/CabraChat).**
24
+
25
+ **Conheça os outros modelos finetuned para português: [Cabra](https://huggingface.co/collections/nicolasdec/cabra-65d12286c4d2b2e4029c0c63).**
26
+
27
+ ## Detalhes do Modelo
28
+
29
+ ### Modelo: Mistral 7b Instruct 0.2
30
+
31
+ Mistral-7B-v0.1 é um modelo de transformador, com as seguintes escolhas arquitetônicas:
32
+
33
+ - Grouped-Query Attention
34
+ - Sliding-Window Attention
35
+ - Byte-fallback BPE tokenizer
36
+
37
+ ### dataset: Cabra 5k
38
+
39
+ Dataset Interno para finetuing. Vamos lançar em breve.
40
+
41
+ ### Exemplo
42
+
43
+ ```
44
+ <s> [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter.</s>
45
+
46
+ ```
47
+
48
+ ### Paramentros de trainamento
49
+
50
+ ```
51
+ - learning_rate: 1e-05
52
+ - train_batch_size: 4
53
+ - eval_batch_size: 4
54
+ - seed: 42
55
+ - distributed_type: multi-GPU
56
+ - num_devices: 2
57
+ - gradient_accumulation_steps: 8
58
+ - total_train_batch_size: 64
59
+ - total_eval_batch_size: 8
60
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
61
+ - lr_scheduler_type: cosine
62
+ - lr_scheduler_warmup_ratio: 0.01
63
+ - num_epochs: 3
64
+ ```
65
+
66
+ ### Framework
67
+
68
+ - Transformers 4.39.0.dev0
69
+ - Pytorch 2.1.2+cu118
70
+ - Datasets 2.14.6
71
+ - Tokenizers 0.15.2
72
+
73
+ ## Uso
74
+ O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pesquisa possíveis incluem:
75
+
76
+ - Pesquisa sobre modelos gerativos.
77
+ - Investigação e compreensão das limitações e viéses de modelos gerativos.
78
+
79
+ **Proibido para uso comercial. Somente Pesquisa.**
80
+
81
+ ### Evals
82
+
83
+ | Tasks | Version | Filter | n-shot | Metric | Value | Stderr |
84
+ |-----------------------------|---------|----------------------|--------|----------|--------|---------|
85
+ | assin2_rte | 1.1 | all | 15 | f1_macro | 0.9013 | ± 0.0043 |
86
+ | | | all | 15 | acc | 0.9016 | ± 0.0043 |
87
+ | assin2_sts | 1.1 | all | 15 | pearson | 0.7151 | ± 0.0074 |
88
+ | | | all | 15 | mse | 0.6803 | ± N/A |
89
+ | bluex | 1.1 | all | 3 | acc | 0.4798 | ± 0.0107 |
90
+ | | | exam_id__USP_2019 | 3 | acc | 0.375 | ± 0.044 |
91
+ | | | exam_id__USP_2021 | 3 | acc | 0.3462 | ± 0.0382 |
92
+ | | | exam_id__USP_2020 | 3 | acc | 0.4107 | ± 0.0379 |
93
+ | | | exam_id__UNICAMP_2018| 3 | acc | 0.4815 | ± 0.0392 |
94
+ | | | exam_id__UNICAMP_2020| 3 | acc | 0.4727 | ± 0.0389 |
95
+ | | | exam_id__UNICAMP_2021_1| 3 | acc | 0.413 | ± 0.0418 |
96
+ | | | exam_id__UNICAMP_2019| 3 | acc | 0.42 | ± 0.0404 |
97
+ | | | exam_id__UNICAMP_2022| 3 | acc | 0.5897 | ± 0.0456 |
98
+ | | | exam_id__USP_2022 | 3 | acc | 0.449 | ± 0.041 |
99
+ | | | exam_id__USP_2024 | 3 | acc | 0.6341 | ± 0.0434 |
100
+ | | | exam_id__UNICAMP_2024| 3 | acc | 0.6 | ± 0.0422 |
101
+ | | | exam_id__USP_2023 | 3 | acc | 0.5455 | ± 0.0433 |
102
+ | | | exam_id__UNICAMP_2023| 3 | acc | 0.5349 | ± 0.044 |
103
+ | | | exam_id__USP_2018 | 3 | acc | 0.4815 | ± 0.0393 |
104
+ | | | exam_id__UNICAMP_2021_2| 3 | acc | 0.5098 | ± 0.0403 |
105
+ | enem | 1.1 | all | 3 | acc | 0.5843 | ± 0.0075 |
106
+ | | | exam_id__2010 | 3 | acc | 0.5726 | ± 0.0264 |
107
+ | | | exam_id__2009 | 3 | acc | 0.6 | ± 0.0264 |
108
+ | | | exam_id__2014 | 3 | acc | 0.633 | ± 0.0268 |
109
+ | | | exam_id__2022 | 3 | acc | 0.6165 | ± 0.0243 |
110
+ | | | exam_id__2012 | 3 | acc | 0.569 | ± 0.0265 |
111
+ | | | exam_id__2013 | 3 | acc | 0.5833 | ± 0.0274 |
112
+ | | | exam_id__2016_2 | 3 | acc | 0.5203 | ± 0.026 |
113
+ | | | exam_id__2011 | 3 | acc | 0.6325 | ± 0.0257 |
114
+ | | | exam_id__2023 | 3 | acc | 0.5778 | ± 0.0246 |
115
+ | | | exam_id__2016 | 3 | acc | 0.595 | ± 0.0258 |
116
+ | | | exam_id__2017 | 3 | acc | 0.5517 | ± 0.0267 |
117
+ | | | exam_id__2015 | 3 | acc | 0.563 | ± 0.0261 |
118
+ | faquad_nli | 1.1 | all | 15 | f1_macro | 0.6424 | ± 0.0138 |
119
+ | | | all | 15 | acc | 0.6769 | ± 0.013 |
120
+ | hatebr_offensive_binary | 1 | all | 25 | f1_macro | 0.8361 | ± 0.007 |
121
+ | | | all | 25 | acc | 0.8371 | ± 0.007 |
122
+ | oab_exams | 1.5 | all | 3 | acc | 0.3841 | ± 0.006 |
123
+ | | | exam_id__2011-03 | 3 | acc | 0.3636 | ± 0.0279 |
124
+ | | | exam_id__2014-14 | 3 | acc | 0.475 | ± 0.0323 |
125
+ | | | exam_id__2016-21 | 3 | acc | 0.4125 | ± 0.0318 |
126
+ | | | exam_id__2012-06a | 3 | acc | 0.3875 | ± 0.0313 |
127
+ | | | exam_id__2014-13 | 3 | acc | 0.325 | ± 0.0303 |
128
+ | | | exam_id__2015-16 | 3 | acc | 0.425 | ± 0.032 |
129
+ | | | exam_id__2010-02 | 3 | acc | 0.4 | ± 0.0283 |
130
+ | | | exam_id__2012-08 | 3 | acc | 0.3875 | ± 0.0314 |
131
+ | | | exam_id__2011-05 | 3 | acc | 0.375 | ± 0.0312 |
132
+ | | | exam_id__2017-22 | 3 | acc | 0.4 | ± 0.0316 |
133
+ | | | exam_id__2018-25 | 3 | acc | 0.4125 | ± 0.0318 |
134
+ | | | exam_id__2012-09 | 3 | acc | 0.3636 | ± 0.0317 |
135
+ | | | exam_id__2017-24 | 3 | acc | 0.3375 | ± 0.0304 |
136
+ | | | exam_id__2016-20a | 3 | acc | 0.3125 | ± 0.0299 |
137
+ | | | exam_id__2012-06 | 3 | acc | 0.425 | ± 0.0318 |
138
+ | | | exam_id__2013-12 | 3 | acc | 0.4375 | ± 0.0321 |
139
+ | | | exam_id__2016-20 | 3 | acc | 0.45 | ± 0.0322 |
140
+ | | | exam_id__2013-11 | 3 | acc | 0.4 | ± 0.0316 |
141
+ | | | exam_id__2015-17 | 3 | acc | 0.4231 | ± 0.0323 |
142
+ | | | exam_id__2015-18 | 3 | acc | 0.4 | ± 0.0316 |
143
+ | | | exam_id__2017-23 | 3 | acc | 0.35 | ± 0.0308 |
144
+ | | | exam_id__2010-01 | 3 | acc | 0.2471 | ± 0.0271 |
145
+ | | | exam_id__2011-04 | 3 | acc | 0.375 | ± 0.0313 |
146
+ | | | exam_id__2016-19 | 3 | acc | 0.4103 | ± 0.0321 |
147
+ | | | exam_id__2013-10 | 3 | acc | 0.3375 | ± 0.0305 |
148
+ | | | exam_id__2012-07 | 3 | acc | 0.3625 | ± 0.031 |
149
+ | | | exam_id__2014-15 | 3 | acc | 0.3846 | ± 0.0318 |
150
+ | portuguese_hate_speech_binary | 1 | all | 25 | f1_macro | 0.6187 | ± 0.0119 |
151
+ | | | all | 25 | acc | 0.6322 | ± 0.0117 |
152
+