|
---
|
|
license: cc-by-nc-sa-4.0
|
|
language:
|
|
- en
|
|
- de
|
|
- zh
|
|
- fr
|
|
- nl
|
|
- el
|
|
- it
|
|
library_name: transformers
|
|
pipeline_tag: audio-classification
|
|
tags:
|
|
- HuBERT
|
|
- Speech Emotion Recognition
|
|
- SER
|
|
- PyTorch
|
|
---
|
|
|
|
# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets** |
|
Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller |
|
|
|
Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours. |
|
The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive |
|
valence. |
|
Further details are available in the corresponding [**paper**](https://arxiv.org/) |
|
|
|
**Note**: This model is for research purpose only. |
|
|
|
### EmoSet++ subsets used for fine-tuning the model: |
|
|
|
| | | | | | |
|
| :---: | :---: | :---: | :---: | :---: | |
|
| ABC | AD | BES | CASIA | CVE | |
|
| Crema-D | DES | DEMoS | EA-ACT | EA-BMW | |
|
| EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO | |
|
| eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo | |
|
| GEMEP | GVESS | IEMOCAP | MES | MESD | |
|
| MELD | PPMMK | RAVDESS | SAVEE | ShEMO | |
|
| SmartKom | SIMIS | SUSAS | SUBSECO | TESS | |
|
| TurkishEmo | Urdu | | | | |
|
|
|
|
|
|
|
### Usage |
|
|
|
```python |
|
import torch |
|
import torch.nn as nn |
|
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor |
|
|
|
|
|
|
|
# CONFIG and MODEL SETUP |
|
model_name = '.../HuBERT-EmoSet++' |
|
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960") |
|
model = HubertForSequenceClassification.from_pretrained(model_name) |
|
model.classifier = nn.Linear(in_features=256,out_features=6) |
|
|
|
sampling_rate=16000 |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = model.to(device) |
|
|
|
|
|
``` |
|
|
|
### Citation Info |
|
|
|
|
|
``` |
|
@inproceedings{Amiriparian24-EEH, |
|
author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller}, |
|
title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}}, |
|
booktitle = {{Proc. INTERSPEECH}}, |
|
year = {2024}, |
|
editor = {}, |
|
volume = {}, |
|
series = {}, |
|
address = {Kos Island, Greece}, |
|
month = {September}, |
|
publisher = {ISCA}, |
|
} |
|
``` |