amiriparian commited on
Commit
919bfb2
1 Parent(s): f4f5ec5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +213 -10
README.md CHANGED
@@ -39,14 +39,14 @@ Further details are available in the corresponding [**paper**](https://arxiv.org
39
 
40
  | | | | | |
41
  | :---: | :---: | :---: | :---: | :---: |
42
- | ABC | AD | BES | CASIA | CVE |
43
- | Crema-D | DES | DEMoS | EA-ACT | EA-BMW |
44
- | EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO |
45
- | eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo |
46
- | GEMEP | GVESS | IEMOCAP | MES | MESD |
47
- | MELD | PPMMK | RAVDESS | SAVEE | ShEMO |
48
- | SmartKom | SIMIS | SUSAS | SUBSECO | TESS |
49
- | TurkishEmo | Urdu | | | |
50
 
51
 
52
 
@@ -60,9 +60,11 @@ from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtract
60
 
61
 
62
  # CONFIG and MODEL SETUP
63
- model_name = 'amiriparian/HuBERT-EmoSet'
64
  feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
65
  model = AutoModelForAudioClassification.from_pretrained(model_name, trust_remote_code=True,revision="b158d45ed8578432468f3ab8d46cbe5974380812")
 
 
66
  model.freeze_og_encoder()
67
 
68
  sampling_rate=16000
@@ -88,4 +90,205 @@ model = model.to(device)
88
  month = {September},
89
  publisher = {ISCA},
90
  }
91
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  | | | | | |
41
  | :---: | :---: | :---: | :---: | :---: |
42
+ | ABC [[1]](#1)| AD [[2]](#2) | BES [[3]](#3) | CASIA [[4]](#4) | CVE [[5]](#5) |
43
+ | Crema-D [[6]](#6)| DES [[7]](#) | DEMoS [[8]](#8) | EA-ACT [[9]](#9) | EA-BMW [[9]](#9) |
44
+ | EA-WSJ [[9]](#9) | EMO-DB [[10]](#10) | EmoFilm [[11]](#11) | EmotiW-2014 [[12]](#12) | EMOVO [[13]](#13) |
45
+ | eNTERFACE [[14]](#14) | ESD [[15]](#15) | EU-EmoSS [[16]](#16) | EU-EV [[17]](#17) | FAU Aibo [[18]](#18) |
46
+ | GEMEP [[19]](#19) | GVESS [[20]](#20) | IEMOCAP [[21]](#21) | MES [[3]](#3) | MESD [[22]](#22) |
47
+ | MELD [[23]](#23)| PPMMK [[2]](#2) | RAVDESS [[24]](#24) | SAVEE [[25]](#25) | ShEMO [[26]](#26) |
48
+ | SmartKom [[27]](#27) | SIMIS [[28]](#28) | SUSAS [[29]](#29) | SUBSECO [[30]](#30) | TESS [[31]](#31) |
49
+ | TurkishEmo [[2]](#2) | Urdu [[32]](#32) | | | |
50
 
51
 
52
 
 
60
 
61
 
62
  # CONFIG and MODEL SETUP
63
+ model_name = 'amiriparian/ExHuBERT'
64
  feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
65
  model = AutoModelForAudioClassification.from_pretrained(model_name, trust_remote_code=True,revision="b158d45ed8578432468f3ab8d46cbe5974380812")
66
+
67
+ # Freezing half of the encoder
68
  model.freeze_og_encoder()
69
 
70
  sampling_rate=16000
 
90
  month = {September},
91
  publisher = {ISCA},
92
  }
93
+
94
+
95
+ ```
96
+
97
+ ### References
98
+
99
+ <a id="1">[1]</a>
100
+ B. Schuller, D. Arsic, G. Rigoll, M. Wimmer, and B. Radig. Audiovisual Behavior
101
+ Modeling by Combined Feature Spaces. In 2007 IEEE International Conference on
102
+ Acoustics, Speech and Signal Processing - ICASSP ’07, volume 2, pages II–733–II–
103
+ 736, Apr. 2007.
104
+
105
+
106
+ <a id="2">[2]</a>
107
+ M. Gerczuk, S. Amiriparian, S. Ottl, and B. W. Schuller. EmoNet: A Transfer
108
+ Learning Framework for Multi-Corpus Speech Emotion Recognition. IEEE Trans-
109
+ actions on Affective Computing, 14(2):1472–1487, Apr. 2023.
110
+
111
+
112
+ <a id="3">[3]</a>
113
+ T. L. Nwe, S. W. Foo, and L. C. De Silva. Speech emotion recognition using hidden
114
+ Markov models. Speech Communication, 41(4):603–623, Nov. 2003.
115
+
116
+
117
+ <a id="4">[4]</a>
118
+ The selected speech emotion database of institute of automation chineseacademy of
119
+ sciences (casia). http://www.chineseldc.org/resource_info.php?rid=76. accessed March 2024.
120
+
121
+
122
+ <a id="5">[5]</a>
123
+ P. Liu and M. D. Pell. Recognizing vocal emotions in Mandarin Chinese: A val-
124
+ idated database of Chinese vocal emotional stimuli. Behavior Research Methods,
125
+ 44(4):1042–1051, Dec. 2012.
126
+
127
+
128
+ <a id="6">[6]</a>
129
+ H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma.
130
+ CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset. IEEE transactions on affective computing, 5(4):377–390, 2014.
131
+
132
+
133
+
134
+ <a id="7">[7]</a>
135
+ I. S. Engberg, A. V. Hansen, O. K. Andersen, and P. Dalsgaard. Design Record-
136
+ ing and Verification of a Danish Emotional Speech Database: Design Recording
137
+ and Verification of a Danish Emotional Speech Database. EUROSPEECH’97 : 5th
138
+ European Conference on Speech Communication and Technology, Patras, Rhodes,
139
+ Greece, 22-25 September 1997, pages Vol. 4, pp. 1695–1698, 1997.
140
+
141
+
142
+
143
+ <a id="8">[8]</a>
144
+ E. Parada-Cabaleiro, G. Costantini, A. Batliner, M. Schmitt, and B. W. Schuller.
145
+ DEMoS: An Italian emotional speech corpus. Language Resources and Evaluation,
146
+ 54(2):341–383, June 2020.
147
+
148
+
149
+ <a id="9">[9]</a>
150
+ B. Schuller. Automatische Emotionserkennung Aus Sprachlicher Und Manueller
151
+ Interaktion. PhD thesis, Technische Universit¨at M¨unchen, 2006.
152
+
153
+
154
+ <a id="10">[10]</a>
155
+ F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database
156
+ of German emotional speech. In Interspeech 2005, pages 1517–1520. ISCA, Sept.
157
+ 2005.
158
+
159
+
160
+ <a id="11">[11]</a>
161
+ E. Parada-Cabaleiro, G. Costantini, A. Batliner, A. Baird, and B. Schuller.
162
+ Categorical vs Dimensional Perception of Italian Emotional Speech. In Interspeech 2018,
163
+ pages 3638–3642. ISCA, Sept. 2018.
164
+
165
+
166
+ <a id="12">[12]</a>
167
+ A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion Recognition In
168
+ The Wild Challenge 2014: Baseline, Data and Protocol. In Proceedings of the 16th
169
+ International Conference on Multimodal Interaction, ICMI ’14, pages 461–466, New
170
+ York, NY, USA, Nov. 2014. Association for Computing Machinery.
171
+
172
+
173
+ <a id="13">[13]</a>
174
+ G. Costantini, I. Iaderola, A. Paoloni, and M. Todisco. EMOVO Corpus: An Italian
175
+ Emotional Speech Database. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson,
176
+ B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceed-
177
+ ings of the Ninth International Conference on Language Resources and Evaluation
178
+ (LREC’14), pages 3501–3504, Reykjavik, Iceland, May 2014. European Language
179
+ Resources Association (ELRA).
180
+
181
+
182
+
183
+ <a id="14">[14]</a>
184
+ O. Martin, I. Kotsia, B. Macq, and I. Pitas. The eNTERFACE’ 05 Audio-Visual
185
+ Emotion Database. In 22nd International Conference on Data Engineering Work-
186
+ shops (ICDEW’06), pages 8–8, Apr. 2006.
187
+
188
+
189
+
190
+
191
+ <a id="15">[15]</a>
192
+ K. Zhou, B. Sisman, R. Liu, and H. Li. Seen and Unseen emotional style transfer
193
+ for voice conversion with a new emotional speech dataset, Feb. 2021.
194
+
195
+
196
+
197
+ <a id="16">[16]</a>
198
+ H. O’Reilly, D. Pigat, S. Fridenson, S. Berggren, S. Tal, O. Golan, S. B¨olte, S. Baron-
199
+ Cohen, and D. Lundqvist. The EU-Emotion Stimulus Set: A validation study.
200
+ Behavior Research Methods, 48(2):567–576, June 2016.
201
+
202
+
203
+
204
+ <a id="17">[17]</a>
205
+ A. Lassalle, D. Pigat, H. O’Reilly, S. Berggen, S. Fridenson-Hayo, S. Tal, S. Elfstr¨om,
206
+ A. R˚ade, O. Golan, S. B¨olte, S. Baron-Cohen, and D. Lundqvist. The EU-Emotion
207
+ Voice Database. Behavior Research Methods, 51(2):493–506, Apr. 2019.
208
+
209
+
210
+ <a id="18">[18]</a>
211
+ A. Batliner, S. Steidl, and E. Noth. Releasing a thoroughly annotated and processed
212
+ spontaneous emotional database: The FAU Aibo Emotion Corpus. 2008.
213
+
214
+
215
+ <a id="19">[19]</a>
216
+ K. R. Scherer, T. B¨anziger, and E. Roesch. A Blueprint for Affective Computing:
217
+ A Sourcebook and Manual. OUP Oxford, Sept. 2010.
218
+
219
+
220
+ <a id="20">[20]</a>
221
+ R. Banse and K. R. Scherer. Acoustic profiles in vocal emotion expression. Journal
222
+ of Personality and Social Psychology, 70(3):614–636, 1996.
223
+
224
+
225
+ <a id="21">[21]</a>
226
+ C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang,
227
+ S. Lee, and S. S. Narayanan. IEMOCAP: Interactive emotional dyadic motion
228
+ capture database. Language Resources and Evaluation, 42(4):335–359, Dec. 2008.
229
+
230
+ <a id="22">[22]</a>
231
+ M. M. Duville, L. M. Alonso-Valerdi, and D. I. Ibarra-Zarate. The Mexican Emo-
232
+ tional Speech Database (MESD): Elaboration and assessment based on machine
233
+ learning. Annual International Conference of the IEEE Engineering in Medicine
234
+ and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual
235
+ International Conference, 2021:1644–1647, Nov. 2021.
236
+
237
+ <a id="23">[23]</a>
238
+ S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. MELD:
239
+ A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations, June
240
+ 2019.
241
+
242
+ <a id="24">[24]</a>
243
+ S. R. Livingstone and F. A. Russo. The Ryerson Audio-Visual Database of Emo-
244
+ tional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal
245
+ expressions in North American English. PLOS ONE, 13(5):e0196391, May 2018.
246
+
247
+
248
+ <a id="25">[25]</a>
249
+ S. Haq and P. J. B. Jackson. Speaker-dependent audio-visual emotion recognition.
250
+ In Proc. AVSP 2009, pages 53–58, 2009.
251
+
252
+
253
+ <a id="26">[26]</a>
254
+ O. Mohamad Nezami, P. Jamshid Lou, and M. Karami. ShEMO: A large-scale
255
+ validated database for Persian speech emotion detection. Language Resources and
256
+ Evaluation, 53(1):1–16, Mar. 2019.
257
+
258
+
259
+ <a id="27">[27]</a>
260
+ F. Schiel, S. Steininger, and U. T¨urk. The SmartKom Multimodal Corpus at BAS. In
261
+ M. Gonz´alez Rodr´ıguez and C. P. Suarez Araujo, editors, Proceedings of the Third
262
+ International Conference on Language Resources and Evaluation (LREC’02), Las
263
+ Palmas, Canary Islands - Spain, May 2002. European Language Resources Association (ELRA).
264
+
265
+
266
+ <a id="28">[28]</a>
267
+ B. Schuller, F. Eyben, S. Can, and H. Feußner. Speech in Minimal Invasive Surgery
268
+ - Towards an Affective Language Resource of Real-life Medical Operations. 2010.
269
+
270
+
271
+ <a id="29">[29]</a>
272
+ J. H. L. Hansen and S. E. Bou-Ghazale. Getting started with SUSAS: A speech under
273
+ simulated and actual stress database. In Proc. Eurospeech 1997, pages 1743–1746,
274
+ 1997.
275
+
276
+
277
+
278
+ <a id="30">[30]</a>
279
+ S. Sultana, M. S. Rahman, M. R. Selim, and M. Z. Iqbal. SUST Bangla Emotional
280
+ Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla.
281
+ PLOS ONE, 16(4):e0250173, Apr. 2021.
282
+
283
+
284
+ <a id="31">[31]</a>
285
+ M. K. Pichora-Fuller and K. Dupuis. Toronto emotional speech set (TESS), Feb.
286
+ 2020.
287
+
288
+
289
+
290
+ <a id="32">[32]</a>
291
+ S. Latif, A. Qayyum, M. Usman, and J. Qadir. Cross Lingual Speech Emotion
292
+ Recognition: Urdu vs. Western Languages. In 2018 International Conference on
293
+ Frontiers of Information Technology (FIT), pages 88–93, Dec. 2018.
294
+