--- license: cc-by-nc-sa-4.0 language: - en - de - zh - fr - nl - el - it library_name: transformers pipeline_tag: audio-classification tags: - HuBERT - Speech Emotion Recognition - SER - PyTorch --- # **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets** Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours. The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive valence. Further details are available in the corresponding [**paper**](https://arxiv.org/) **Note**: This model is for research purpose only. ### EmoSet++ subsets used for fine-tuning the model: | | | | | | | :---: | :---: | :---: | :---: | :---: | | ABC | AD | BES | CASIA | CVE | | Crema-D | DES | DEMoS | EA-ACT | EA-BMW | | EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO | | eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo | | GEMEP | GVESS | IEMOCAP | MES | MESD | | MELD | PPMMK | RAVDESS | SAVEE | ShEMO | | SmartKom | SIMIS | SUSAS | SUBSECO | TESS | | TurkishEmo | Urdu | | | | ### Usage ```python import torch import torch.nn as nn from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor # CONFIG and MODEL SETUP model_name = '.../HuBERT-EmoSet++' feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960") model = HubertForSequenceClassification.from_pretrained(model_name) model.classifier = nn.Linear(in_features=256,out_features=6) sampling_rate=16000 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) ``` ### Citation Info ``` @inproceedings{Amiriparian24-EEH, author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller}, title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}}, booktitle = {{Proc. INTERSPEECH}}, year = {2024}, editor = {}, volume = {}, series = {}, address = {Kos Island, Greece}, month = {September}, publisher = {ISCA}, } ```