|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
- de |
|
- zh |
|
- fr |
|
- nl |
|
- el |
|
- it |
|
- es |
|
- my |
|
- he |
|
- sv |
|
- fa |
|
- tr |
|
- ur |
|
library_name: transformers |
|
pipeline_tag: audio-classification |
|
tags: |
|
- Speech Emotion Recognition |
|
- SER |
|
- Transformer |
|
- HuBERT |
|
- PyTorch |
|
--- |
|
|
|
# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets** |
|
Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller |
|
|
|
Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours. |
|
The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive |
|
valence. |
|
Further details are available in the corresponding [**paper**](https://arxiv.org/) |
|
|
|
**Note**: This model is for research purpose only. |
|
|
|
### EmoSet++ subsets used for fine-tuning the model: |
|
|
|
| | | | | | |
|
| :---: | :---: | :---: | :---: | :---: | |
|
| ABC | AD | BES | CASIA | CVE | |
|
| Crema-D | DES | DEMoS | EA-ACT | EA-BMW | |
|
| EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO | |
|
| eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo | |
|
| GEMEP | GVESS | IEMOCAP | MES | MESD | |
|
| MELD | PPMMK | RAVDESS | SAVEE | ShEMO | |
|
| SmartKom | SIMIS | SUSAS | SUBSECO | TESS | |
|
| TurkishEmo | Urdu | | | | |
|
|
|
|
|
|
|
### Usage |
|
|
|
```python |
|
import torch |
|
import torch.nn as nn |
|
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor |
|
|
|
|
|
|
|
# CONFIG and MODEL SETUP |
|
model_name = '.../HuBERT-EmoSet++' |
|
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960") |
|
model = HubertForSequenceClassification.from_pretrained(model_name) |
|
model.classifier = nn.Linear(in_features=256,out_features=6) |
|
|
|
sampling_rate=16000 |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = model.to(device) |
|
|
|
|
|
``` |
|
|
|
### Citation Info |
|
|
|
|
|
``` |
|
@inproceedings{Amiriparian24-EEH, |
|
author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller}, |
|
title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}}, |
|
booktitle = {{Proc. INTERSPEECH}}, |
|
year = {2024}, |
|
editor = {}, |
|
volume = {}, |
|
series = {}, |
|
address = {Kos Island, Greece}, |
|
month = {September}, |
|
publisher = {ISCA}, |
|
} |
|
``` |