Ross Wightman
commited on
Commit
•
e0a996a
1
Parent(s):
923c8d7
Update README add tokenizer/vocab/preprocessor cfg
Browse files- README.md +19 -8
- preprocessor_config.json +19 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- vocab.json +0 -0
README.md
CHANGED
@@ -6,11 +6,12 @@ license: mit
|
|
6 |
# Table of Contents
|
7 |
|
8 |
1. [Model Details](#model-details)
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
14 |
|
15 |
|
16 |
# Model Details
|
@@ -19,9 +20,11 @@ license: mit
|
|
19 |
|
20 |
A CLIP ViT-g/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
|
21 |
|
|
|
|
|
22 |
# Uses
|
23 |
|
24 |
-
As per the original OpenAI CLIP
|
25 |
|
26 |
The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
|
27 |
|
@@ -55,7 +58,7 @@ This model was trained with the 2 Billion sample English subset of LAION-5B (htt
|
|
55 |
|
56 |
## Training Procedure
|
57 |
|
58 |
-
|
59 |
|
60 |
# Evaluation
|
61 |
|
@@ -71,7 +74,15 @@ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/ab
|
|
71 |
|
72 |
## Results
|
73 |
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
# Citation
|
77 |
|
|
|
6 |
# Table of Contents
|
7 |
|
8 |
1. [Model Details](#model-details)
|
9 |
+
2. [Uses](#uses)
|
10 |
+
3. [Training Details](#training-details)
|
11 |
+
4. [Evaluation](#evaluation)
|
12 |
+
5. [Acknowledgements](#acknowledgements)
|
13 |
+
6. [Citation](#citation)
|
14 |
+
7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
|
15 |
|
16 |
|
17 |
# Model Details
|
|
|
20 |
|
21 |
A CLIP ViT-g/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
|
22 |
|
23 |
+
Model training done by Romain Beaumont on the [stability.ai](https://stability.ai/) cluster.
|
24 |
+
|
25 |
# Uses
|
26 |
|
27 |
+
As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
|
28 |
|
29 |
The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
|
30 |
|
|
|
58 |
|
59 |
## Training Procedure
|
60 |
|
61 |
+
Please see [training notes](https://docs.google.com/document/d/1EFbMLRWSSV0LUf9Du1pWzWqgeiIRPwEWX2s1C6mAk5c) and [wandb logs](https://wandb.ai/rom1504/eval_openclip/reports/slow-g-14--VmlldzoyNTMwMjg5).
|
62 |
|
63 |
# Evaluation
|
64 |
|
|
|
74 |
|
75 |
## Results
|
76 |
|
77 |
+
The model achieves a 76.6 zero-shot top-1 accuracy on ImageNet-1k.
|
78 |
+
|
79 |
+
An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
|
80 |
+
|
81 |
+
**TODO** - create table for just this model's metrics.
|
82 |
+
|
83 |
+
# Acknowledgements
|
84 |
+
|
85 |
+
Acknowledging [stability.ai](https://stability.ai/) for the compute used to train this model.
|
86 |
|
87 |
# Citation
|
88 |
|
preprocessor_config.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"crop_size": 224,
|
3 |
+
"do_center_crop": true,
|
4 |
+
"do_normalize": true,
|
5 |
+
"do_resize": true,
|
6 |
+
"feature_extractor_type": "CLIPFeatureExtractor",
|
7 |
+
"image_mean": [
|
8 |
+
0.48145466,
|
9 |
+
0.4578275,
|
10 |
+
0.40821073
|
11 |
+
],
|
12 |
+
"image_std": [
|
13 |
+
0.26862954,
|
14 |
+
0.26130258,
|
15 |
+
0.27577711
|
16 |
+
],
|
17 |
+
"resample": 3,
|
18 |
+
"size": 224
|
19 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": "<|endoftext|>"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": "<|endoftext|>", "add_prefix_space": false, "errors": "replace", "do_lower_case": true, "name_or_path": "./clip_ViT_B_32/"}
|
vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|