SerenaChou's picture
Update README.md
2c9ce0f verified
metadata
pipeline_tag: sentence-similarity
tags:
  - sentence-similarity
  - sentence-transformers
license: mit
language:
  - multilingual
  - af
  - am
  - ar
  - as
  - az
  - be
  - bg
  - bn
  - br
  - bs
  - ca
  - cs
  - cy
  - da
  - de
  - el
  - en
  - eo
  - es
  - et
  - eu
  - fa
  - fi
  - fr
  - fy
  - ga
  - gd
  - gl
  - gu
  - ha
  - he
  - hi
  - hr
  - hu
  - hy
  - id
  - is
  - it
  - ja
  - jv
  - ka
  - kk
  - km
  - kn
  - ko
  - ku
  - ky
  - la
  - lo
  - lt
  - lv
  - mg
  - mk
  - ml
  - mn
  - mr
  - ms
  - my
  - ne
  - nl
  - 'no'
  - om
  - or
  - pa
  - pl
  - ps
  - pt
  - ro
  - ru
  - sa
  - sd
  - si
  - sk
  - sl
  - so
  - sq
  - sr
  - su
  - sv
  - sw
  - ta
  - te
  - th
  - tl
  - tr
  - ug
  - uk
  - ur
  - uz
  - vi
  - xh
  - yi
  - zh

A quantized version of multilingual-e5-small. Quantization was performed per-layer under the same conditions as our ELSERv2 model, as described here.

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

Benchmarks

We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model.

Quality

Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model.

de yo ru ar es th
multilingual-e5-small 0.75862 0.56193 0.80309 0.82778 0.81672 0.85072
multilingual-e5-small-optimized 0.75992 0.48934 0.79668 0.82017 0.8135 0.84316

To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger change in SCIFACT, but marginal in the other datasets evaluated.

FIQA SCIFACT nfcorpus
multilingual-e5-small 0.33126 0.677 0.31004
multilingual-e5-small-optimized 0.31734 0.65484 0.30126

Performance

Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model.

input length (characters) multilingual-e5-small multilingual-e5-small-optimized speedup
0 - 50 0.0181 0.00826 54.36%
50 - 100 0.0275 0.0164 40.36%
100 - 150 0.0366 0.0237 35.25%
150 - 200 0.0435 0.0301 30.80%
200 - 250 0.0514 0.0379 26.26%
250 - 300 0.0569 0.043 24.43%
300 - 350 0.0663 0.0513 22.62%
350 - 400 0.0737 0.0576 21.85%

Disclaimer

This e5 model, as defined, hosted, integrated and used in conjunction with our other Elastic Software is covered by our standard warranty.