DeepGlint-AI
/

llava-mlcd-qwen2.5-7b

Model card Files Files and versions Community

xiangan commited on Oct 16

Commit

56cca97

•

1 Parent(s): 8ed49f1

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -16,6 +16,25 @@ We used [**MLCD**](https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-33
 ## Data
 Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
 ## Performance and Limitations
 In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.

 ## Data
 Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
+## How to eval
+```shell
+pip install lmms-eval==0.2.0
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+python -m accelerate.commands.launch \
+  --main_process_port=12581 \
+  --num_processes=8 \
+  -m lmms_eval \
+  --model llava \
+  --model_args pretrained=DeepGlint-AI/llava-mlcd-qwen2.5-7b,conv_template=qwen_1_5 \
+  --tasks mmbench,mme,mmmu,ocrbench,scienceqa,scienceqa_img,seedbench,gqa,pope,textvqa_val,ai2d,chartqa,docvqa_val,infovqa_val,mmstar \
+  --batch_size 1 \
+  --log_samples \
+  --log_samples_suffix mlcd_llava_qwen2_7b \
+  --output_path ./log_mlcd_llava_qwen2_7b/
+```
 ## Performance and Limitations
 In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.