license: cc-by-sa-4.0
datasets:
- Ar4ikov/iemocap_audio_text_splitted
language:
- en
- zh
metrics:
- f1
library_name: transformers
pipeline_tag: audio-classification
tags:
- speech-emotion-recognition
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Fine-tuned facebook/wav2vec2-large-xlsr-53 on English and Chinese data from all-age speakers. The model is trained on the training sets of CREMA-D, CSED, ElderReact, ESD, IEMOCAP, and TESS. When using this model, make sure that your speech input is sampled at 16kHz.
The scripts used for training and evaluation can be found here: https://github.com/HLTCHKUST/elderly_ser/tree/main
Evaluation Results
For the details (e.g., the statistics of train
, valid
, and test
data), please refer to our paper on arXiv.
It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults.
Citation
Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on arXiv. If you find our work useful, please consider citing our paper as follows:
@misc{cahyawijaya2023crosslingual,
title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition},
author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung},
year={2023},
eprint={2306.14517},
archivePrefix={arXiv},
primaryClass={cs.CL}
}