update
Browse files
README.md
CHANGED
@@ -1,4 +1,12 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
Official huggingface models of **ViTamin**, from the following paper:
|
3 |
|
4 |
[ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
|
@@ -6,7 +14,7 @@ Official huggingface models of **ViTamin**, from the following paper:
|
|
6 |
🏠  Johns Hopkins University, Bytedance
|
7 |
|
8 |
|
9 |
-
Load from HuggingFace:
|
10 |
```python
|
11 |
import torch
|
12 |
import open_clip
|
@@ -31,8 +39,7 @@ with torch.no_grad(), torch.cuda.amp.autocast():
|
|
31 |
image_features, text_features, logit_scale = model(pixel_values, text)
|
32 |
text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
|
33 |
|
34 |
-
print("Label probs:", text_probs)
|
35 |
-
|
36 |
```
|
37 |
|
38 |
## Main Results with CLIP Pre-training on DataComp-1B
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- mlfoundations/datacomp_1b
|
5 |
+
pipeline_tag: feature-extraction
|
6 |
+
---
|
7 |
+
|
8 |
+
# Model card for ViTamin-XL-336px
|
9 |
+
|
10 |
Official huggingface models of **ViTamin**, from the following paper:
|
11 |
|
12 |
[ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
|
|
|
14 |
🏠  Johns Hopkins University, Bytedance
|
15 |
|
16 |
|
17 |
+
Load from HuggingFace with transformers.AutoModel:
|
18 |
```python
|
19 |
import torch
|
20 |
import open_clip
|
|
|
39 |
image_features, text_features, logit_scale = model(pixel_values, text)
|
40 |
text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
|
41 |
|
42 |
+
print("Label probs:", text_probs)
|
|
|
43 |
```
|
44 |
|
45 |
## Main Results with CLIP Pre-training on DataComp-1B
|