How to use visual grounding with this model ?

#25
by r4hul77 - opened

In the documentation it is said that this model has visual grounding (object detection and segmentation), what is the best way to use this from this model (As I understand llama only outputs text tokens) ?

Sign up or log in to comment