Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,25 @@ We used [**MLCD**](https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-33
|
|
16 |
## Data
|
17 |
Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Performance and Limitations
|
20 |
|
21 |
In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.
|
|
|
16 |
## Data
|
17 |
Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
|
18 |
|
19 |
+
## How to eval
|
20 |
+
```shell
|
21 |
+
pip install lmms-eval==0.2.0
|
22 |
+
|
23 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
24 |
+
python -m accelerate.commands.launch \
|
25 |
+
--main_process_port=12581 \
|
26 |
+
--num_processes=8 \
|
27 |
+
-m lmms_eval \
|
28 |
+
--model llava \
|
29 |
+
--model_args pretrained=DeepGlint-AI/llava-mlcd-qwen2.5-7b,conv_template=qwen_1_5 \
|
30 |
+
--tasks mmbench,mme,mmmu,ocrbench,scienceqa,scienceqa_img,seedbench,gqa,pope,textvqa_val,ai2d,chartqa,docvqa_val,infovqa_val,mmstar \
|
31 |
+
--batch_size 1 \
|
32 |
+
--log_samples \
|
33 |
+
--log_samples_suffix mlcd_llava_qwen2_7b \
|
34 |
+
--output_path ./log_mlcd_llava_qwen2_7b/
|
35 |
+
```
|
36 |
+
|
37 |
+
|
38 |
## Performance and Limitations
|
39 |
|
40 |
In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.
|