ONNX Conversion script
#10
by
ha1772007
- opened
Can you provide the script by which this model is converted to q4
I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208
P.s. are you getting good results with that quantization?
Yes Quantization is increasing good speed especially on CPU
comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity
I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity
ha1772007
changed discussion status to
closed
ha1772007
changed discussion status to
open