File size: 2,484 Bytes
f6c39b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2738a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: cc-by-nc-sa-4.0
language:
- en
- de
- zh
- fr
- nl
- el
- it
- es
- my
- he
- sv
- fa
- tr
- ur
library_name: transformers
pipeline_tag: audio-classification
tags:
- Speech Emotion Recognition
- SER
- Transformer
- HuBERT
- PyTorch
---

# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets**
Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller

Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours.
The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive
valence.
Further details are available in the corresponding [**paper**](https://arxiv.org/)

**Note**: This model is for research purpose only.

### EmoSet++ subsets used for fine-tuning the model:

|     |    |     |    |     |
| :---:   | :---: | :---: | :---: | :---: |
| ABC | AD    | BES    | CASIA   | CVE    |
| Crema-D | DES   | DEMoS   | EA-ACT   | EA-BMW   |
| EA-WSJ | EMO-DB    | EmoFilm    | EmotiW-2014   | EMOVO    |
| eNTERFACE | ESD    | EU-EmoSS    | EU-EV   | FAU Aibo    |
| GEMEP | GVESS    | IEMOCAP    | MES   |   MESD  |
| MELD |   PPMMK  |  RAVDESS   |  SAVEE  |   ShEMO  |
| SmartKom |   SIMIS  |  SUSAS   |  SUBSECO  |   TESS  |
| TurkishEmo |  Urdu   |     |    |     |



### Usage

```python 
import torch
import torch.nn as nn
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor



# CONFIG and MODEL SETUP
model_name = '.../HuBERT-EmoSet++'
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
model = HubertForSequenceClassification.from_pretrained(model_name)
model.classifier = nn.Linear(in_features=256,out_features=6)

sampling_rate=16000 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)


```

### Citation Info


```
@inproceedings{Amiriparian24-EEH,
  author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller},
  title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}},
  booktitle = {{Proc. INTERSPEECH}}, 
  year = {2024},
  editor = {},
  volume = {},
  series = {},
  address = {Kos Island, Greece},
  month = {September},
  publisher = {ISCA},
}
```