Model beto_sentiment_analysis_es
A finetuned model for Sentiment analysis in Spanish
This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container, The base model is BETO which is a BERT-base model pre-trained on a spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique.
BETO Citation
Spanish Pre-Trained BERT Model and Evaluation Data
@inproceedings{CaneteCFP2020,
title={Spanish Pre-Trained BERT Model and Evaluation Data},
author={Cañete, José and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou-Hui and Kang, Hojin and Pérez, Jorge},
booktitle={PML4DC at ICLR 2020},
year={2020}
}
Dataset
The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
Sizes of datasets:
- Train dataset: 42,500
- Validation dataset: 3,750
- Test dataset: 3,750
Intended uses & limitations
This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.
Hyperparameters
{
"epochs": "4",
"train_batch_size": "32",
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"dccuchile/bert-base-spanish-wwm-uncased\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}
Evaluation results
Accuracy = 0.9101333333333333
F1 Score = 0.9088450094671354
Precision = 0.9105691056910569
Recall = 0.9071274298056156
Test results
Model in action
Usage for Sentiment Analysis
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("edumunozsala/beto_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/beto_sentiment_analysis_es")
text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)
Created by Eduardo Muñoz/@edumunozsala
- Downloads last month
- 74
Evaluation results
- Accuracy on IMDb Reviews in Spanishself-reported0.910
- F1 Score on IMDb Reviews in Spanishself-reported0.909
- Precision on IMDb Reviews in Spanishself-reported0.911
- Recall on IMDb Reviews in Spanishself-reported0.907