jienengchen
/

ViTamin-XL-384px

Feature Extraction

Model card Files Files and versions Community

bbexx commited on Apr 5

Commit

0c0aebc

•

1 Parent(s): 4ad0409

update

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -1,4 +1,12 @@
-# [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf)
 Official huggingface models of **ViTamin**, from the following paper:
 [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
@@ -6,7 +14,7 @@ Official huggingface models of **ViTamin**, from the following paper:
 🏠 &ensp;Johns Hopkins University, Bytedance
-Load from HuggingFace:
 ```python
 import torch
 import open_clip
@@ -31,8 +39,7 @@ with torch.no_grad(), torch.cuda.amp.autocast():
     image_features, text_features, logit_scale = model(pixel_values, text)
     text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
-print("Label probs:", text_probs)
 ```
 ## Main Results with CLIP Pre-training on DataComp-1B

+---
+license: mit
+datasets:
+- mlfoundations/datacomp_1b
+pipeline_tag: feature-extraction
+---
+# Model card for ViTamin-XL-336px
 Official huggingface models of **ViTamin**, from the following paper:
 [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
 🏠 &ensp;Johns Hopkins University, Bytedance
+Load from HuggingFace with transformers.AutoModel:
 ```python
 import torch
 import open_clip
     image_features, text_features, logit_scale = model(pixel_values, text)
     text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
+print("Label probs:", text_probs)
 ```
 ## Main Results with CLIP Pre-training on DataComp-1B