allenai
/

Molmo-72B-0924

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

chrisc36 commited on Sep 30

Commit

bfcc419

•

1 Parent(s): 23a46a1

Update README.md

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -92,6 +92,27 @@ print(generated_text)
 #     The puppy is positioned in the center of the frame, looking up at the camera...
 ```
 ## Evaluations
 | Model                       | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |

 #     The puppy is positioned in the center of the frame, looking up at the camera...
 ```
+To make inference more efficient, run with autocast:
+with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
+  output = model.generate_from_batch(
+      inputs,
+      GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
+      tokenizer=processor.tokenizer
+  )
+We did most of our evaluation in this setting (autocast on, but float32 weights)
+To even further reduce the memory requirements, the model can be run with bfloat16 weights:
+model.to(dtype=torch.bfloat16)
+inputs["images"] = inputs["images"].to(torch.bfloat16)
+output = model.generate_from_batch(
+    inputs,
+    GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
+    tokenizer=processor.tokenizer
+)
+Note that we have observed that this can change the output of the model compared to running with float32 weights.
 ## Evaluations
 | Model                       | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |