ONNX Conversion script

#10

by ha1772007 - opened Oct 7

Oct 7

Can you provide the script by which this model is converted to q4

Snowflake org Oct 7

•

The ONNX files were contributed without a conversion script by HuggingFace staff member @Xenova here, so you may want to ping @Xenova directly.

Oct 7

I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208

P.s. are you getting good results with that quantization?

Oct 8

Yes Quantization is increasing good speed especially on CPU

comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity

I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity

ha1772007 changed discussion status to closed Oct 8

ha1772007 changed discussion status to open Oct 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment