osanseviero's picture
Specify right model card metadata
145b19a verified
|
raw
history blame
646 Bytes
metadata
license: apache-2.0
tags:
  - llava
datasets:
  - Ejafa/ye-pop
pipeline_tag: image-text-to-text

A ViT-B/32 CLIP model trained for 4 epochs on the ye-pop dataset (491,520 images and LLaVA 1.5-generated detailed captions). Research artifact of clip-synthetic-captions. Outperforms the CLIP model trained using the original alt-texts on the DataComp benchmark suite (38 image classification and retrieval tasks).

Note: likely not directly useful as it is severely undertrained.