Navyabhat
/

Llava-Phi2

Visual Question Answering

text-generation

Inference Endpoints

Model card Files Files and versions Community

Navyabhat commited on Jan 28

Commit

0d4b9e7

•

1 Parent(s): e176f04

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/p
 2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
 4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
 5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
-6. Finetuned Model: [RaviNaik/Llava-Phi2](https://huggingface.co/RaviNaik/Llava-Phi2)
 ### Model Sources
@@ -26,7 +26,7 @@ This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/p
 - **Original Repository:** [Llava-Phi](https://github.com/zhuyiche/llava-phi)
 - **Paper [optional]:** [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
-- **Demo [optional]:** [Demo Link](https://huggingface.co/spaces/RaviNaik/MultiModal-Phi2)
 ## How to Get Started with the Model
@@ -47,7 +47,7 @@ pip install -e .
 3. Run the Model
 ```bash
 python llava_phi/eval/run_llava_phi.py --model-path="RaviNaik/Llava-Phi2" \
-    --image-file="https://huggingface.co/RaviNaik/Llava-Phi2/resolve/main/people.jpg?download=true" \
     --query="How many people are there in the image?"
 ```

 2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
 4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
 5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
+6. Finetuned Model: [Navyabhat/Llava-Phi2](https://huggingface.co/Navyabhat/Llava-Phi2)
 ### Model Sources
 - **Original Repository:** [Llava-Phi](https://github.com/zhuyiche/llava-phi)
 - **Paper [optional]:** [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
+- **Demo [optional]:** [Demo Link](https://huggingface.co/spaces/Navyabhat/MultiModal-Phi2)
 ## How to Get Started with the Model
 3. Run the Model
 ```bash
 python llava_phi/eval/run_llava_phi.py --model-path="RaviNaik/Llava-Phi2" \
+    --image-file="https://huggingface.co/avyabhat/Llava-Phi2/resolve/main/people.jpg?download=true" \
     --query="How many people are there in the image?"
 ```