Fix model naming (idefix/idefics)
Browse files
README.md
CHANGED
@@ -243,40 +243,40 @@ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, a
|
|
243 |
|
244 |
We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
|
245 |
|
246 |
-
| Model
|
247 |
-
|
248 |
-
|
|
249 |
-
|
|
250 |
-
|
|
251 |
-
|
|
252 |
-
|
|
253 |
<br>
|
254 |
-
|
|
255 |
-
|
|
256 |
-
|
|
257 |
-
|
|
258 |
-
|
|
259 |
|
260 |
For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
|
261 |
|
262 |
| Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
|
263 |
|:-----------|--------:|-----------------:|:----------------|--------------------------:|
|
264 |
-
|
|
265 |
| | 16 | 5K | RICES | 72.9 |
|
266 |
<br>
|
267 |
-
|
|
268 |
| | 16 | 5K | RICES | 64.5 |
|
269 |
|
270 |
Fairness Evaluations:
|
271 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
272 |
|:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
|
273 |
-
|
|
274 |
| | 4 | 95.2 | 48.8 | 50.6 |
|
275 |
| | 8 | 95.5 | 52.3 | 53.1 |
|
276 |
| | 16 | 95.7 | 47.6 | 52.8 |
|
277 |
| | 32 | 95.7 | 36.5 | 51.2 |
|
278 |
<br>
|
279 |
-
|
|
280 |
| | 4 | 93.9 | 35.3 | 44.3 |
|
281 |
| | 8 | 95.4 | 44.7 | 46.0 |
|
282 |
| | 16 | 95.8 | 43.0 | 46.1 |
|
@@ -304,13 +304,13 @@ Idefics Instruct Evaluations:
|
|
304 |
Fairness Evaluations:
|
305 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
306 |
|:---------------------|--------:|----------------------------:|--------------------------:|-------------------------:|
|
307 |
-
| 80B
|
308 |
| | 4 | 95.6 | 51.4 | 48.3 |
|
309 |
| | 8 | 95.8 | 51.0 | 51.1 |
|
310 |
| | 16 | 96.1 | 47.6 | 51.8 |
|
311 |
| | 32 | 96.2 | 36.8 | 50.3 |
|
312 |
<br>
|
313 |
-
| 9B
|
314 |
| | 4 | 95.2 | 43.3 | 38.7 |
|
315 |
| | 8 | 95.8 | 51.7 | 40.1 |
|
316 |
| | 16 | 96.1 | 58.9 | 41.7 |
|
|
|
243 |
|
244 |
We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
|
245 |
|
246 |
+
| Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
247 |
+
|:------------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
|
248 |
+
| IDEFICS 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.75/22.5)|
|
249 |
+
| | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
|
250 |
+
| | 8 | 64.8 | 55.1 | 35.7 | 46.1 | 77.6 | 114.3 | 105.7 | 76.6 | 47.9 | 58.2 | - | 67.8 | - |
|
251 |
+
| | 16 | 65.4 | 56.8 | 36.3 | 48.3 | 81.4 | 116.6 | 107.0 | 80.1 | - | 55.8 | - | 67.7 | - |
|
252 |
+
| | 32 | 65.9 | 57.8 | 36.7 | 50.0 | 82.7 | 116.6 | 107.5 | 81.1 | - | 52.5 | - | 67.3 | - |
|
253 |
<br>
|
254 |
+
| IDEFICS 9B | 0 | 50.9 | 38.4 | 25.9 | 35.5 | 25.4 | 46.0 | 36.8 | 27.3 | 48.7 | 51.7 | 44.2 | 61.8 | 5.0 (16.8/20.8) |
|
255 |
+
| | 4 | 55.4 | 45.5 | 27.6 | 36.9 | 60.0 | 93.0 | 81.3 | 59.7 | 47.9 | 50.7 | 37.4 | 62.3 | - |
|
256 |
+
| | 8 | 56.4 | 47.7 | 27.5 | 40.4 | 63.2 | 97.0 | 86.8 | 61.9 | 47.6 | 51.0 | - | 66.3 | - |
|
257 |
+
| | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
|
258 |
+
| | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
|
259 |
|
260 |
For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
|
261 |
|
262 |
| Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
|
263 |
|:-----------|--------:|-----------------:|:----------------|--------------------------:|
|
264 |
+
| IDEFICS 80B | 16 | 1K | Random | 65.4 |
|
265 |
| | 16 | 5K | RICES | 72.9 |
|
266 |
<br>
|
267 |
+
| IDEFICS 9B | 16 | 1K | Random | 53.5 |
|
268 |
| | 16 | 5K | RICES | 64.5 |
|
269 |
|
270 |
Fairness Evaluations:
|
271 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
272 |
|:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
|
273 |
+
| IDEFICS 80B| 0 | 95.8 | 64.1 | 51.0 |
|
274 |
| | 4 | 95.2 | 48.8 | 50.6 |
|
275 |
| | 8 | 95.5 | 52.3 | 53.1 |
|
276 |
| | 16 | 95.7 | 47.6 | 52.8 |
|
277 |
| | 32 | 95.7 | 36.5 | 51.2 |
|
278 |
<br>
|
279 |
+
| IDEFICS 9B | 0 | 94.4 | 55.3 | 45.1 |
|
280 |
| | 4 | 93.9 | 35.3 | 44.3 |
|
281 |
| | 8 | 95.4 | 44.7 | 46.0 |
|
282 |
| | 16 | 95.8 | 43.0 | 46.1 |
|
|
|
304 |
Fairness Evaluations:
|
305 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
306 |
|:---------------------|--------:|----------------------------:|--------------------------:|-------------------------:|
|
307 |
+
| IDEFICS 80B Instruct | 0 | 95.7 | 63.4 | 47.1 |
|
308 |
| | 4 | 95.6 | 51.4 | 48.3 |
|
309 |
| | 8 | 95.8 | 51.0 | 51.1 |
|
310 |
| | 16 | 96.1 | 47.6 | 51.8 |
|
311 |
| | 32 | 96.2 | 36.8 | 50.3 |
|
312 |
<br>
|
313 |
+
| IDEFICS 9B Instruct | 0 | 92.7 | 59.6 | 43.9 |
|
314 |
| | 4 | 95.2 | 43.3 | 38.7 |
|
315 |
| | 8 | 95.8 | 51.7 | 40.1 |
|
316 |
| | 16 | 96.1 | 58.9 | 41.7 |
|