--- language: - pt - en license: cc tags: - text-generation-inference - transformers - mistral - gguf - brazil - brasil - portuguese base_model: mistralai/Mistral-7B-Instruct-v0.2 pipeline_tag: text-generation metrics: - name: assin2_rte f1_macro type: assin2_rte value: 90.13 - name: assin2_rte acc type: assin2_rte value: 90.16 - name: assin2_sts pearson type: assin2_sts value: 71.51 - name: assin2_sts mse type: assin2_sts value: 68.03 - name: bluex acc type: bluex value: 47.98 - name: enem acc type: enem value: 58.43 - name: faquad_nli f1_macro type: faquad_nli value: 64.24 - name: faquad_nli acc type: faquad_nli value: 67.69 - name: hatebr_offensive_binary f1_macro type: hatebr_offensive_binary value: 83.61 - name: hatebr_offensive_binary acc type: hatebr_offensive_binary value: 83.71 - name: oab_exams acc type: oab_exams value: 38.41 - name: portuguese_hate_speech_binary f1_macro type: portuguese_hate_speech_binary value: 61.87 - name: portuguese_hate_speech_binary acc type: portuguese_hate_speech_binary value: 63.22 --- # Cabra Mistral 7b v2 Esse modelo é um finetune do [Mistral 7b Instruct 0.2](https://huggingface.co/mistralai/mistral-7b-instruct-v0.2) com o dataset interno Cabra 10k. Esse modelo é optimizado para português e responde em portuguese nativamente. Ele apresenta melhoria em varios benchmarks brasileiros em comparação com o modelo base. **Exprimente o nosso demo aqui: [CabraChat](https://huggingface.co/spaces/nicolasdec/CabraChat).** **Conheça os nossos outros modelos: [Cabra](https://huggingface.co/collections/botbot-ai/models-6604c2069ceef04f834ba99b).** ## Detalhes do Modelo ### Modelo: Mistral 7b Instruct 0.2 Mistral-7B-v0.1 é um modelo de transformador, com as seguintes escolhas arquitetônicas: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ### dataset: Cabra 10k Dataset interno para finetuning. Vamos lançar em breve. ### Quantização / GGUF Colocamos diversas versões (GGUF) quantanizadas no branch "quantanization". ### Exemplo ``` [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter. ``` ### Paramentros de trainamento ``` - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 ``` ### Framework - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.14.6 - Tokenizers 0.15.2 ## Uso O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pesquisa possíveis incluem: - Pesquisa sobre modelos gerativos. - Investigação e compreensão das limitações e viéses de modelos gerativos. **Proibido para uso comercial. Somente Pesquisa.** ### Evals | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |-----------------------------|---------|----------------------|--------|----------|--------|---------| | assin2_rte | 1.1 | all | 15 | f1_macro | 0.9013 | ± 0.0043 | | | | all | 15 | acc | 0.9016 | ± 0.0043 | | assin2_sts | 1.1 | all | 15 | pearson | 0.7151 | ± 0.0074 | | | | all | 15 | mse | 0.6803 | ± N/A | | bluex | 1.1 | all | 3 | acc | 0.4798 | ± 0.0107 | | | | exam_id__USP_2019 | 3 | acc | 0.375 | ± 0.044 | | | | exam_id__USP_2021 | 3 | acc | 0.3462 | ± 0.0382 | | | | exam_id__USP_2020 | 3 | acc | 0.4107 | ± 0.0379 | | | | exam_id__UNICAMP_2018| 3 | acc | 0.4815 | ± 0.0392 | | | | exam_id__UNICAMP_2020| 3 | acc | 0.4727 | ± 0.0389 | | | | exam_id__UNICAMP_2021_1| 3 | acc | 0.413 | ± 0.0418 | | | | exam_id__UNICAMP_2019| 3 | acc | 0.42 | ± 0.0404 | | | | exam_id__UNICAMP_2022| 3 | acc | 0.5897 | ± 0.0456 | | | | exam_id__USP_2022 | 3 | acc | 0.449 | ± 0.041 | | | | exam_id__USP_2024 | 3 | acc | 0.6341 | ± 0.0434 | | | | exam_id__UNICAMP_2024| 3 | acc | 0.6 | ± 0.0422 | | | | exam_id__USP_2023 | 3 | acc | 0.5455 | ± 0.0433 | | | | exam_id__UNICAMP_2023| 3 | acc | 0.5349 | ± 0.044 | | | | exam_id__USP_2018 | 3 | acc | 0.4815 | ± 0.0393 | | | | exam_id__UNICAMP_2021_2| 3 | acc | 0.5098 | ± 0.0403 | | enem | 1.1 | all | 3 | acc | 0.5843 | ± 0.0075 | | | | exam_id__2010 | 3 | acc | 0.5726 | ± 0.0264 | | | | exam_id__2009 | 3 | acc | 0.6 | ± 0.0264 | | | | exam_id__2014 | 3 | acc | 0.633 | ± 0.0268 | | | | exam_id__2022 | 3 | acc | 0.6165 | ± 0.0243 | | | | exam_id__2012 | 3 | acc | 0.569 | ± 0.0265 | | | | exam_id__2013 | 3 | acc | 0.5833 | ± 0.0274 | | | | exam_id__2016_2 | 3 | acc | 0.5203 | ± 0.026 | | | | exam_id__2011 | 3 | acc | 0.6325 | ± 0.0257 | | | | exam_id__2023 | 3 | acc | 0.5778 | ± 0.0246 | | | | exam_id__2016 | 3 | acc | 0.595 | ± 0.0258 | | | | exam_id__2017 | 3 | acc | 0.5517 | ± 0.0267 | | | | exam_id__2015 | 3 | acc | 0.563 | ± 0.0261 | | faquad_nli | 1.1 | all | 15 | f1_macro | 0.6424 | ± 0.0138 | | | | all | 15 | acc | 0.6769 | ± 0.013 | | hatebr_offensive_binary | 1 | all | 25 | f1_macro | 0.8361 | ± 0.007 | | | | all | 25 | acc | 0.8371 | ± 0.007 | | oab_exams | 1.5 | all | 3 | acc | 0.3841 | ± 0.006 | | | | exam_id__2011-03 | 3 | acc | 0.3636 | ± 0.0279 | | | | exam_id__2014-14 | 3 | acc | 0.475 | ± 0.0323 | | | | exam_id__2016-21 | 3 | acc | 0.4125 | ± 0.0318 | | | | exam_id__2012-06a | 3 | acc | 0.3875 | ± 0.0313 | | | | exam_id__2014-13 | 3 | acc | 0.325 | ± 0.0303 | | | | exam_id__2015-16 | 3 | acc | 0.425 | ± 0.032 | | | | exam_id__2010-02 | 3 | acc | 0.4 | ± 0.0283 | | | | exam_id__2012-08 | 3 | acc | 0.3875 | ± 0.0314 | | | | exam_id__2011-05 | 3 | acc | 0.375 | ± 0.0312 | | | | exam_id__2017-22 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2018-25 | 3 | acc | 0.4125 | ± 0.0318 | | | | exam_id__2012-09 | 3 | acc | 0.3636 | ± 0.0317 | | | | exam_id__2017-24 | 3 | acc | 0.3375 | ± 0.0304 | | | | exam_id__2016-20a | 3 | acc | 0.3125 | ± 0.0299 | | | | exam_id__2012-06 | 3 | acc | 0.425 | ± 0.0318 | | | | exam_id__2013-12 | 3 | acc | 0.4375 | ± 0.0321 | | | | exam_id__2016-20 | 3 | acc | 0.45 | ± 0.0322 | | | | exam_id__2013-11 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2015-17 | 3 | acc | 0.4231 | ± 0.0323 | | | | exam_id__2015-18 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2017-23 | 3 | acc | 0.35 | ± 0.0308 | | | | exam_id__2010-01 | 3 | acc | 0.2471 | ± 0.0271 | | | | exam_id__2011-04 | 3 | acc | 0.375 | ± 0.0313 | | | | exam_id__2016-19 | 3 | acc | 0.4103 | ± 0.0321 | | | | exam_id__2013-10 | 3 | acc | 0.3375 | ± 0.0305 | | | | exam_id__2012-07 | 3 | acc | 0.3625 | ± 0.031 | | | | exam_id__2014-15 | 3 | acc | 0.3846 | ± 0.0318 | | portuguese_hate_speech_binary | 1 | all | 25 | f1_macro | 0.6187 | ± 0.0119 | | | | all | 25 | acc | 0.6322 | ± 0.0117 |