Muennighoff commited on
Commit
4494b51
2 Parent(s): 7251f46 8a83ff8

Merge branch 'main' of https://huggingface.co/allenai/MolmoE-1B-0924 into main

Browse files
Files changed (1) hide show
  1. README.md +79 -19
README.md CHANGED
@@ -2,35 +2,95 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- tags:
6
- - moe
7
- - olmo
8
- - olmoe
9
- - molmo
10
- - molmoe
11
- co2_eq_emissions: 1
12
  datasets:
13
  - allenai/OLMoE-mix-0924
14
- library_name: transformers
 
 
 
 
 
 
 
15
  ---
16
 
17
- <img alt="Molmo Logo." src="molmo_logo.png" width="250px">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- # Model Summary
 
 
 
 
 
 
20
 
21
- > MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among multimodal models with a similar size while being fully open-source.
 
 
 
 
22
 
23
- - **Paper:** WIP
24
- - **Code:** WIP
25
 
26
- # Use
 
 
 
 
 
27
 
28
- WIP
 
 
29
 
30
- # Evaluation Snapshot
 
31
 
32
- WIP
 
 
33
 
34
- # Citation
35
 
36
- WIP
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ base_model:
6
+ - openai/clip-vit-large-patch14-336
7
+ - allenai/OLMoE-1B-7B-0924
 
 
 
 
8
  datasets:
9
  - allenai/OLMoE-mix-0924
10
+ pipeline_tag: image-text-to-text
11
+ tags:
12
+ - multimodal
13
+ - moe
14
+ - olmo
15
+ - olmoe
16
+ - molmo
17
+ - molmoe
18
  ---
19
 
20
+ <img src="molmo_logo.png" alt="Logo for the Molmo Project" style="width: auto; height: 50px;">
21
+
22
+ # MolmoE 1B
23
+
24
+ Molmo is an open vision-language model developed by the Allen Institute for AI. Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo family [here](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19).
25
+
26
+ MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924).
27
+ It nearly matches the performance of GPT-4V on both academic benchmarks and human evaluation, and achieves state-of-the-art performance among similarly-sized open multimodal models.
28
+
29
+ This checkpoint is a **preview** of the Molmo release. All artifacts used in creating Molmo (PixMo dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
30
+
31
+ **[Sign up here](https://docs.google.com/forms/d/e/1FAIpQLSdML1MhNNBDsCHpgWG65Oydg2SjZzVasyqlP08nBrWjZp_c7A/viewform)** to be the first to know when artifacts are released.
32
+
33
+
34
+
35
+ ## Quick Start
36
+
37
+ To run MolmoE, first install dependencies:
38
+
39
+ ```bash
40
+ pip install einops tensorflow torchvision
41
+ ```
42
+
43
+ Then, follow these steps:
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
47
+ from PIL import Image
48
+ import requests
49
+
50
+ # load the processor
51
+ processor = AutoProcessor.from_pretrained(
52
+ 'allenai/MolmoE-1B-0924',
53
+ trust_remote_code=True,
54
+ torch_dtype='auto',
55
+ device_map='auto'
56
+ )
57
 
58
+ # load the model
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ 'allenai/MolmoE-1B-0924',
61
+ trust_remote_code=True,
62
+ torch_dtype='auto',
63
+ device_map='auto'
64
+ )
65
 
66
+ # process the image and text
67
+ inputs = processor.process(
68
+ images=[Image.open(requests.get("https://picsum.photos/id/237/536/354", stream=True).raw)],
69
+ text="Describe this image."
70
+ )
71
 
72
+ # move inputs to the correct device and make a batch of size 1
73
+ inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
74
 
75
+ # generate output; maximum 200 new tokens; stop generation when <|endoftext|> is generated
76
+ output = model.generate_from_batch(
77
+ inputs,
78
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
79
+ tokenizer=processor.tokenizer
80
+ )
81
 
82
+ # only get generated tokens; decode them to text
83
+ generated_tokens = output[0,inputs['input_ids'].size(1):]
84
+ generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
85
 
86
+ # print the generated text
87
+ print(generated_text)
88
 
89
+ # >>> This photograph captures an adorable black Labrador puppy sitting on a weathered
90
+ # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
91
+ ```
92
 
93
+ ## License and Use
94
 
95
+ This model is licensed under Apache 2.0. It is intended for research and educational use.
96
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).