File size: 646 Bytes
247ad11 145b19a 5724056 145b19a 247ad11 5724056 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: apache-2.0
tags:
- llava
datasets:
- Ejafa/ye-pop
pipeline_tag: image-text-to-text
---
A ViT-B/32 CLIP model trained for 4 epochs on the [ye-pop](https://huggingface.co/datasets/Ejafa/ye-pop) dataset (491,520 images and [LLaVA 1.5](https://github.com/haotian-liu/LLaVA)-generated detailed captions). Research artifact of [clip-synthetic-captions](https://github.com/nopperl/clip-synthetic-captions). Outperforms the CLIP model trained using the original alt-texts on the [DataComp benchmark suite](https://datacomp.ai) (38 image classification and retrieval tasks).
Note: likely not directly useful as it is severely undertrained.
|