amiriparian
/

ExHuBERT

Audio Classification

Speech Emotion Recognition

Affective Computing

Model card Files Files and versions Community

ExHuBERT / README.md

amiriparian's picture

Update README.md

f6c39b5 verified 6 months ago

|

2.48 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- en
	- de
	- zh
	- fr
	- nl
	- el
	- it
	- es
	- my
	- he
	- sv
	- fa
	- tr
	- ur
	library_name: transformers
	pipeline_tag: audio-classification
	tags:
	- Speech Emotion Recognition
	- SER
	- Transformer
	- HuBERT
	- PyTorch
	---

	# ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets
	Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller

	Fine-tuned [HuBERT Large](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours.
	The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive
	valence.
	Further details are available in the corresponding [paper](https://arxiv.org/)

	Note: This model is for research purpose only.

	### EmoSet++ subsets used for fine-tuning the model:

	\| \| \| \| \| \|
	\| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| ABC \| AD \| BES \| CASIA \| CVE \|
	\| Crema-D \| DES \| DEMoS \| EA-ACT \| EA-BMW \|
	\| EA-WSJ \| EMO-DB \| EmoFilm \| EmotiW-2014 \| EMOVO \|
	\| eNTERFACE \| ESD \| EU-EmoSS \| EU-EV \| FAU Aibo \|
	\| GEMEP \| GVESS \| IEMOCAP \| MES \| MESD \|
	\| MELD \| PPMMK \| RAVDESS \| SAVEE \| ShEMO \|
	\| SmartKom \| SIMIS \| SUSAS \| SUBSECO \| TESS \|
	\| TurkishEmo \| Urdu \| \| \| \|



	### Usage

	```python
	import torch
	import torch.nn as nn
	from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor



	# CONFIG and MODEL SETUP
	model_name = '.../HuBERT-EmoSet++'
	feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
	model = HubertForSequenceClassification.from_pretrained(model_name)
	model.classifier = nn.Linear(in_features=256,out_features=6)

	sampling_rate=16000
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)


	```

	### Citation Info


	```
	@inproceedings{Amiriparian24-EEH,
	author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller},
	title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}},
	booktitle = {{Proc. INTERSPEECH}},
	year = {2024},
	editor = {},
	volume = {},
	series = {},
	address = {Kos Island, Greece},
	month = {September},
	publisher = {ISCA},
	}
	```