How to use visual grounding with this model ?

#25

by r4hul77 - opened Sep 27

Sep 27

In the documentation it is said that this model has visual grounding (object detection and segmentation), what is the best way to use this from this model (As I understand llama only outputs text tokens) ?

prudant

12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment