metadata

license: apache-2.0
tags:
  - Speech Emotion Recognition
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: Wav2vec2-xlsr-Shemo
    results: []

Wav2vec2-xlsr-Shemo

This model is a fine-tuned version of ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition on the minoosh/shEMO dataset. It achieves the following results on the evaluation set:

Loss: 0.9168
Accuracy: 0.7267

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.1825	1.0	150	1.1383	0.6267
1.3392	2.0	300	1.4398	0.5533
1.2058	3.0	450	1.1194	0.6300
1.0984	4.0	600	1.2049	0.6200
1.0033	5.0	750	1.0080	0.6500
0.9694	6.0	900	0.9878	0.6367
0.8506	7.0	1050	0.8965	0.7033
0.8068	8.0	1200	0.9359	0.6833
0.7674	9.0	1350	1.1235	0.6333
0.7817	10.0	1500	0.8682	0.6900
0.7172	11.0	1650	0.8289	0.7067
0.6989	12.0	1800	0.9318	0.7000
0.6127	13.0	1950	0.8712	0.6967
0.6311	14.0	2100	0.8965	0.7133
0.5901	15.0	2250	0.9008	0.7267
0.5667	16.0	2400	1.0093	0.7200
0.5652	17.0	2550	0.9032	0.7300
0.565	18.0	2700	0.9317	0.7267
0.5705	19.0	2850	1.0134	0.7133
0.4984	20.0	3000	0.9432	0.7367
0.5207	21.0	3150	0.9368	0.6933
0.5005	22.0	3300	0.9746	0.7033
0.5055	23.0	3450	1.0437	0.7133
0.4867	24.0	3600	1.0052	0.7067
0.5315	25.0	3750	0.9689	0.7200
0.4755	26.0	3900	0.8962	0.7367
0.5083	27.0	4050	0.9319	0.7300
0.4661	28.0	4200	0.9301	0.7233
0.4536	29.0	4350	0.9370	0.7267
0.4693	30.0	4500	0.9168	0.7267

Framework versions

Transformers 4.29.2
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3