Model Details
Visual Question Answering Model
This model is a fine-tuned version of microsoft/Florence-2-base-ft
designed for Visual Question Answering (VQA). It has been optimized for tasks where the model interprets images and responds to questions about the visual content.
Model Details
- Finetuned by: prithivMLmods
- Model type: Visual Question Answering (VQA)
- Language(s): English (NLP component)
- License: None specified
- Finetuned from model: microsoft/Florence-2-base-ft
Usage
This model can be used to perform VQA tasks, where it takes an image and a question about the image as input, and returns an answer based on the visual content.
- Downloads last month
- 43
Inference API (serverless) does not yet support model repos that contain custom code.
Model tree for prithivMLmods/Florence-2-VLM-Doc-VQA
Base model
microsoft/Florence-2-base-ft