Safetensors
qwen2
xiangan commited on
Commit
56cca97
1 Parent(s): 8ed49f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -16,6 +16,25 @@ We used [**MLCD**](https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-33
16
  ## Data
17
  Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Performance and Limitations
20
 
21
  In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.
 
16
  ## Data
17
  Our model was trained on publicly available data from the [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) and [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) datasets.
18
 
19
+ ## How to eval
20
+ ```shell
21
+ pip install lmms-eval==0.2.0
22
+
23
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
24
+ python -m accelerate.commands.launch \
25
+ --main_process_port=12581 \
26
+ --num_processes=8 \
27
+ -m lmms_eval \
28
+ --model llava \
29
+ --model_args pretrained=DeepGlint-AI/llava-mlcd-qwen2.5-7b,conv_template=qwen_1_5 \
30
+ --tasks mmbench,mme,mmmu,ocrbench,scienceqa,scienceqa_img,seedbench,gqa,pope,textvqa_val,ai2d,chartqa,docvqa_val,infovqa_val,mmstar \
31
+ --batch_size 1 \
32
+ --log_samples \
33
+ --log_samples_suffix mlcd_llava_qwen2_7b \
34
+ --output_path ./log_mlcd_llava_qwen2_7b/
35
+ ```
36
+
37
+
38
  ## Performance and Limitations
39
 
40
  In our experiments, we replaced the CLIP model in [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) with the MLCD model to demonstrate the performance of the MLCD model in Multimodal Large Language Models (MLLMs). For the language model, we used [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). The evaluation results show that the modified model performs exceptionally well across multiple benchmarks, validating the effectiveness of the MLCD model within MLLMs.