Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LLaVa3-Med
|
2 |
+
|
3 |
+
|
4 |
+
We apply 3-stages to train our model.
|
5 |
+
|
6 |
+
1. Pretraining: We utilize a dataset comprising 600k image-text pairs from PMC and 60k medical references based on Mayo Clinic guidelines for the pretraining phase.
|
7 |
+
2. Instruction Fine-tuning: We employ a dataset consisting of 60k LLaVA_Med instruction fine-tuning examples and PMC-VQA datasets to perform instruction learning.
|
8 |
+
3. Fine-tuning: Our model undergoes fine-tuning on various VQA datasets.
|
9 |
+
|
10 |
+
# Inference
|
11 |
+
|
12 |
+
```python
|
13 |
+
CUDA_VISIBLE_DEVICES=0 python -m evaluation \
|
14 |
+
--model-path model_path \
|
15 |
+
--question-file data_path \
|
16 |
+
--image-folder image_path \
|
17 |
+
--answers-file result.jsonl \
|
18 |
+
--temperature 0.7 \
|
19 |
+
--conv-mode llama3
|
20 |
+
```
|
21 |
+
|
22 |
+
# Results
|
23 |
+
|
24 |
+
| Dataset | Metric | Med-Gemini | Med-PaLM-540B | LLaVa3-Med |
|
25 |
+
|-----------------------|----------|------------|------|----------------------|
|
26 |
+
| Slake-VQA | Token F1 | 87.5 | 89.3 | 89.8† |
|
27 |
+
| Path-VQA | Token F1 | 64.7 | 62.7 | 64.9† |
|
28 |
+
|
29 |
+
|
30 |
+
Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.
|