max audio model input lenght

by arubittu - opened May 21

May 21

what is the maximum audio input lenght I can classify? assuming my sampling lenght is 16 khz. I have tried inferencing with input size up to 100 seconds (100 * 16k size array) and it gives the output. What input size is this model trained to accept? will it have the same performance at larger sizes?

felixbur

audEERING GmbH org May 21

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

arubittu

May 21

i want to do classification on audio clips of larger lenght , around 1 min, the performance should get better right since I am providing the model with more data to classify?

felixbur

audEERING GmbH org May 21

•

edited May 21

i guess best performance would be to segment them and then pool the predictions per speaker, but you could try both and compare

arubittu

May 21

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

did you use dynamic padding for batches? which is why 2 to 6s ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment