akemiH commited on
Commit
c4e4674
1 Parent(s): bd9e1a6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLaVa3-Med
2
+
3
+
4
+ We apply 3-stages to train our model.
5
+
6
+ 1. Pretraining: We utilize a dataset comprising 600k image-text pairs from PMC and 60k medical references based on Mayo Clinic guidelines for the pretraining phase.
7
+ 2. Instruction Fine-tuning: We employ a dataset consisting of 60k LLaVA_Med instruction fine-tuning examples and PMC-VQA datasets to perform instruction learning.
8
+ 3. Fine-tuning: Our model undergoes fine-tuning on various VQA datasets.
9
+
10
+ # Inference
11
+
12
+ ```python
13
+ CUDA_VISIBLE_DEVICES=0 python -m evaluation \
14
+ --model-path model_path \
15
+ --question-file data_path \
16
+ --image-folder image_path \
17
+ --answers-file result.jsonl \
18
+ --temperature 0.7 \
19
+ --conv-mode llama3
20
+ ```
21
+
22
+ # Results
23
+
24
+ | Dataset | Metric | Med-Gemini | Med-PaLM-540B | LLaVa3-Med |
25
+ |-----------------------|----------|------------|------|----------------------|
26
+ | Slake-VQA | Token F1 | 87.5 | 89.3 | 89.8† |
27
+ | Path-VQA | Token F1 | 64.7 | 62.7 | 64.9† |
28
+
29
+
30
+ Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.