File size: 3,121 Bytes
8d16b25
8b08e58
1a42a40
 
 
8d16b25
1a42a40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b08e58
1a42a40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d16b25
1a42a40
cd13223
 
1a42a40
 
 
d6ab683
1a42a40
d6ab683
 
 
 
 
 
 
26519d8
d6ab683
 
 
5889e10
d6ab683
 
26519d8
d6ab683
 
 
 
 
 
 
 
26519d8
d6ab683
 
 
 
 
 
 
 
d254407
2612fa2
d254407
695fc3a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
pipeline_tag: sentence-similarity
tags:
  - sentence-similarity
  - sentence-transformers
license: mit
language:
  - multilingual
  - af
  - am
  - ar
  - as
  - az
  - be
  - bg
  - bn
  - br
  - bs
  - ca
  - cs
  - cy
  - da
  - de
  - el
  - en
  - eo
  - es
  - et
  - eu
  - fa
  - fi
  - fr
  - fy
  - ga
  - gd
  - gl
  - gu
  - ha
  - he
  - hi
  - hr
  - hu
  - hy
  - id
  - is
  - it
  - ja
  - jv
  - ka
  - kk
  - km
  - kn
  - ko
  - ku
  - ky
  - la
  - lo
  - lt
  - lv
  - mg
  - mk
  - ml
  - mn
  - mr
  - ms
  - my
  - ne
  - nl
  - no
  - om
  - or
  - pa
  - pl
  - ps
  - pt
  - ro
  - ru
  - sa
  - sd
  - si
  - sk
  - sl
  - so
  - sq
  - sr
  - su
  - sv
  - sw
  - ta
  - te
  - th
  - tl
  - tr
  - ug
  - uk
  - ur
  - uz
  - vi
  - xh
  - yi
  - zh
---

A quantized version of [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). Quantization was performed per-layer under the same conditions as our ELSERv2 model, as described [here](https://www.elastic.co/search-labs/blog/articles/introducing-elser-v2-part-1#quantization).

[Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/pdf/2212.03533.pdf).
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

## Benchmarks

We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model.

### Quality

Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model.

| | de | yo| ru | ar | es | th |
| --- | --- | ---| --- | --- | --- | --- |
| multilingual-e5-small | 0.75862 | 0.56193 | 0.80309 | 0.82778 | 0.81672 | 0.85072 |
| multilingual-e5-small-optimized | 0.75992 | 0.48934 | 0.79668 | 0.82017 | 0.8135 | 0.84316 |

To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger change in SCIFACT, but marginal in the other datasets evaluated.

| | FIQA | SCIFACT | nfcorpus |
| --- | --- | --- | --- |
| multilingual-e5-small | 0.33126 | 0.677 | 0.31004 |
| multilingual-e5-small-optimized | 0.31734 | 0.65484 | 0.30126 |

### Performance

Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model.

| input length (characters) | multilingual-e5-small | multilingual-e5-small-optimized | speedup |
| --- | --- | --- | --- |
| 0 - 50 | 0.0181 | 0.00826 | 54.36% |
| 50 - 100 | 0.0275 | 0.0164 | 40.36% |
| 100 - 150 | 0.0366 | 0.0237 | 35.25% |
| 150 - 200 | 0.0435 | 0.0301 | 30.80% |
| 200 - 250 | 0.0514 | 0.0379 | 26.26% |
| 250 - 300 | 0.0569 | 0.043 | 24.43% |
| 300 - 350 | 0.0663 | 0.0513 | 22.62% |
| 350 - 400 | 0.0737 | 0.0576 | 21.85% |

### Disclaimer

This e5 model, as defined, hosted, integrated and used in conjunction with our other Elastic Software is covered by our standard warranty.