Automatic Speech Recognition
Transformers
Russian
Inference Endpoints

Podlodka int8 version

#1
by saintman - opened

Hi, Alexey!
Can you please make "int8_float32" version of this Podlodka model? Just want to try it in my environment.
What is your opinion on Podlodka vs Antony66 finetuned model?

Hi, Max.
Can you please explain what do you mean by int8_float32 version? I can pass --quantization param to ct2 converter and it's either int8 or float32.
In my experience, Antony66's model performs just better than pure Whisper. Instead, Podlodka's model generates different output with missing punctuation in some cases. It can improve WER but is adding some new mistakes. So, my personal opinion is to use Antony66's model.

Thanks for comparison.

You can check quantization option here: https://github.com/OpenNMT/CTranslate2/blob/master/docs/quantization.md
F.i. your conversion of Antony66's model with pure "int8" option gives me the following warning on loading this model:
"[ctranslate2] [thread 158] [warning] The compute type inferred from the saved model is int8_bfloat16, but the target device or backend do not support efficient int8_bfloat16 computation. The model weights have been automatically converted to use the int8_float32 compute type instead."
It still loads and works OK.

Thank you, now I see how it works here. Model conversion is on it's way.

Thanks a lot! Will try it.

At first glance works like a charm, thanks!

Sign up or log in to comment