Safetensors
clip_vision_model
xiangan's picture
Update README.md
0862ef0 verified
---
license: apache-2.0
datasets:
- kakaobrain/coyo-700m
---
[[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
This model is trained using the COYO700M dataset. The results below are from linear probe evaluations, demonstrating the model's performance on various benchmarks.
| Dataset | CLIP | MLCD |
|-----------|------|------|
| Food101 | 88.8 | <span style="color:red">90.2</span> |
| CIFAR10 | 95.1 | <span style="color:red">96.9</span> |
| CIFAR100 | 80.5 | <span style="color:red">86.8</span> |
| Birdsnap | 58.5 | <span style="color:red">72.1</span> |
| SUN397 | 76.6 | <span style="color:red">77.4</span> |
| Cars | 81.8 | <span style="color:red">93.5</span> |
| Aircraft | 52.0 | <span style="color:red">74.7</span> |
| VOC2007 | 87.7 | <span style="color:red">90.4</span> |
| DTD | 76.5 | <span style="color:red">83.5</span> |
| Pets | 90.0 | <span style="color:red">93.6</span> |
| Cal101 | 93.0 | <span style="color:red">97.7</span> |
| Flowers | 96.9 | <span style="color:red">98.8</span> |
| ImageNet | 76.1 | <span style="color:red">79.1</span> |