joshdevins
commited on
Commit
•
d6ab683
1
Parent(s):
9342c22
Adds benchmarking section
Browse files
README.md
CHANGED
@@ -108,6 +108,34 @@ Please note that the PyTorch traced model is runnable *only* on Linux with Intel
|
|
108 |
[Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/pdf/2212.03533.pdf).
|
109 |
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022
|
110 |
|
111 |
-
##
|
112 |
|
113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
[Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/pdf/2212.03533.pdf).
|
109 |
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022
|
110 |
|
111 |
+
## Benchmarks
|
112 |
|
113 |
+
We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model.
|
114 |
+
|
115 |
+
### Quality
|
116 |
+
|
117 |
+
Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model.
|
118 |
+
|
119 |
+
| | de | yo| ru | ar | es | th |
|
120 |
+
| multilingual-e5-small | 0.75862 | 0.56193 | 0.80309 | 0.82778 | 0.81672 | 0.85072 |
|
121 |
+
| multilingual-e5-small-optimized | 0.75992 | 0.48934 | 0.79668 | 0.82017 | 0.8135 | 0.84316 |
|
122 |
+
|
123 |
+
To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger changein SCIFACT, but marginal in the other datasets evaluated.
|
124 |
+
|
125 |
+
| | FIQA | SCIFACT | nfcorpus |
|
126 |
+
| multilingual-e5-small | 0.33126 | 0.677 | 0.31004 |
|
127 |
+
| multilingual-e5-small-optimized | 0.31734 | 0.65484 | 0.30126 |
|
128 |
+
|
129 |
+
### Performance
|
130 |
+
|
131 |
+
Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model.
|
132 |
+
|
133 |
+
| input length (characters) | multilingual-e5-small | multilingual-e5-small-optimized | speedup |
|
134 |
+
| 0 - 50 | 0.0181 | 0.00826 | 54.36% |
|
135 |
+
| 50 - 100 | 0.0275 | 0.0164 | 40.36% |
|
136 |
+
| 100 - 150 | 0.0366 | 0.0237 | 35.25% |
|
137 |
+
| 150 - 200 | 0.0435 | 0.0301 | 30.80% |
|
138 |
+
| 200 - 250 | 0.0514 | 0.0379 | 26.26% |
|
139 |
+
| 250 - 300 | 0.0569 | 0.043 | 24.43% |
|
140 |
+
| 300 - 350 | 0.0663 | 0.0513 | 22.62% |
|
141 |
+
| 350 - 400 | 0.0737 | 0.0576 | 21.85% |
|