Safetensors
clip_vision_model
File size: 1,144 Bytes
b7dc0e7
 
3c51324
 
b7dc0e7
 
0862ef0
 
29dd2a6
 
b7dc0e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
license: apache-2.0
datasets:
- kakaobrain/coyo-700m
---

[[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)  

This model is trained using the COYO700M dataset. The results below are from linear probe evaluations, demonstrating the model's performance on various benchmarks.

| Dataset   | CLIP | MLCD |
|-----------|------|------|
| Food101   | 88.8 | <span style="color:red">90.2</span> |
| CIFAR10   | 95.1 | <span style="color:red">96.9</span> |
| CIFAR100  | 80.5 | <span style="color:red">86.8</span> |
| Birdsnap  | 58.5 | <span style="color:red">72.1</span> |
| SUN397    | 76.6 | <span style="color:red">77.4</span> |
| Cars      | 81.8 | <span style="color:red">93.5</span> |
| Aircraft  | 52.0 | <span style="color:red">74.7</span> |
| VOC2007   | 87.7 | <span style="color:red">90.4</span> |
| DTD       | 76.5 | <span style="color:red">83.5</span> |
| Pets      | 90.0 | <span style="color:red">93.6</span> |
| Cal101    | 93.0 | <span style="color:red">97.7</span> |
| Flowers   | 96.9 | <span style="color:red">98.8</span> |
| ImageNet  | 76.1 | <span style="color:red">79.1</span> |