DeepGlint-AI
/

MLCD-Embodied-7B

Safetensors

Chinese

English

qwen2

Model card Files Files and versions Community

tanhuajie2001 commited on 16 days ago

Commit

064fcb5

•

1 Parent(s): 2703754

Update README.md

Browse files

Files changed (1) hide show

README.md +39 -41

README.md CHANGED Viewed

@@ -13,6 +13,45 @@ base_model:
 [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
 ## Usage
 ### A. Installation
@@ -96,45 +135,4 @@ pip install lmms-eval==0.2.0
 bash eval.sh
 ```
-## Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA
-|                |                   | MLCD <br> Embodied-7B | LLaVA <br> OneVision-7B | GPT-4v | RoboMamba |
- :-- | :-- | :-: | :-: | :-: | :-: |
-| RoboVQA        | BLEU1             | <span style="color:red">73.16</span>       | 38.12                   |            -              | 54.9      |
-|                | BLEU2             | <span style="color:red">66.39</span>       | 33.56                   |            -              | 44.2      |
-|                | BLEU3             | <span style="color:red">60.61</span>       | 31.76                   |            -              | 39.5      |
-|                | BLEU4             | <span style="color:red">56.56</span>       | 30.97                   |            -              | 36.3      |
-| OpenEQA        | Object State Recognition | <span style="color:red">71.83</span>   |          -               | 63.2   |            -              |
-|                | Object Recognition        | <span style="color:red">49.46</span>  |          -               | 43.4   |            -              |
-|                | Functional Reasoning      | 54.38                                 |          -               | <span style="color:red">57.4</span> |            -              |
-|                | Spatial Understanding     | <span style="color:red">48.64</span>  |          -               | 33.6   |            -              |
-|                | Attribute Recognition     | <span style="color:red">67.08</span>  |          -               | 57.2   |            -              |
-|                | World Knowledge           | <span style="color:red">53.87</span>  |          -               | 50.7   |            -              |
-|                | Object Localization       | <span style="color:red">43.06</span>  |          -               | 42.0   |            -              |
-## General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4
-| Dataset     | Split   | MLCD<br>Embodied-7B | LLaVA<br>OneVision-7B | GPT-4v   | GPT-4o |
-| :-- | :-: | :-: | :-: | :-: | :-: |
-| A12D        | test    | 79.9             | 81.4               | 78.2     | 94.2   |
-| ChartQA     | test    | 83.0             | 80.0               | 78.5     | 85.7   |
-| DocVQA      | test    | 91.6             | 87.5               | 88.4     | 92.8   |
-| InfoVQA     | val     | 73.9             | 70.7               | -        | -      |
-| InfoVQA     | test    | 70.0             | 68.8               | -        | -      |
-| MMMU        | val     | 47.3             | 48.8               | 56.8     | 69.1   |
-| MMStar      | test    | 58.5             | 61.7               | 57.1     | 63.9   |
-| OCRBench    | -       | 749.0            | 697.0              | 656.0    | 805.0  |
-| RealWorldQA | test    | 68.9             | 66.3               | 61.4     | 58.6   |
-| SeedBench   | image   | 74.9             | 75.4               | 49.9     | 76.2   |
-| MMbench     | en-dev  | 81.1             | 83.2               | 81.3     | 83.4   |
-| MMbench     | en-test | 80.1             | 80.8               | 75.0     | -      |
-| MME         | test    | 578/1603         | 418/1580           | 517/1409 | -      |
 We would like to express our gratitude to [Huajie Tan](https://huggingface.co/tanhuajie2001), [Yumeng Wang](https://huggingface.co/devymex), [Yin Xie](https://huggingface.co/Yin-Xie) for his significant contributions to the experimental validation in MLLMs.

 [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
+## Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA
+|                |                   | MLCD <br> Embodied-7B | LLaVA <br> OneVision-7B | GPT-4v | RoboMamba |
+ :-- | :-- | :-: | :-: | :-: | :-: |
+| RoboVQA        | BLEU1             | <span style="color:red">73.16</span>       | 38.12                   |            -              | 54.9      |
+|                | BLEU2             | <span style="color:red">66.39</span>       | 33.56                   |            -              | 44.2      |
+|                | BLEU3             | <span style="color:red">60.61</span>       | 31.76                   |            -              | 39.5      |
+|                | BLEU4             | <span style="color:red">56.56</span>       | 30.97                   |            -              | 36.3      |
+| OpenEQA        | Object State Recognition | <span style="color:red">71.83</span>   |          -               | 63.2   |            -              |
+|                | Object Recognition        | <span style="color:red">49.46</span>  |          -               | 43.4   |            -              |
+|                | Functional Reasoning      | 54.38                                 |          -               | <span style="color:red">57.4</span> |            -              |
+|                | Spatial Understanding     | <span style="color:red">48.64</span>  |          -               | 33.6   |            -              |
+|                | Attribute Recognition     | <span style="color:red">67.08</span>  |          -               | 57.2   |            -              |
+|                | World Knowledge           | <span style="color:red">53.87</span>  |          -               | 50.7   |            -              |
+|                | Object Localization       | <span style="color:red">43.06</span>  |          -               | 42.0   |            -              |
+## General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4
+| Dataset     | Split   | MLCD<br>Embodied-7B | LLaVA<br>OneVision-7B | GPT-4v   | GPT-4o |
+| :-- | :-: | :-: | :-: | :-: | :-: |
+| A12D        | test    | 79.9             | 81.4               | 78.2     | 94.2   |
+| ChartQA     | test    | 83.0             | 80.0               | 78.5     | 85.7   |
+| DocVQA      | test    | 91.6             | 87.5               | 88.4     | 92.8   |
+| InfoVQA     | val     | 73.9             | 70.7               | -        | -      |
+| InfoVQA     | test    | 70.0             | 68.8               | -        | -      |
+| MMMU        | val     | 47.3             | 48.8               | 56.8     | 69.1   |
+| MMStar      | test    | 58.5             | 61.7               | 57.1     | 63.9   |
+| OCRBench    | -       | 749.0            | 697.0              | 656.0    | 805.0  |
+| RealWorldQA | test    | 68.9             | 66.3               | 61.4     | 58.6   |
+| SeedBench   | image   | 74.9             | 75.4               | 49.9     | 76.2   |
+| MMbench     | en-dev  | 81.1             | 83.2               | 81.3     | 83.4   |
+| MMbench     | en-test | 80.1             | 80.8               | 75.0     | -      |
+| MME         | test    | 578/1603         | 418/1580           | 517/1409 | -      |
 ## Usage
 ### A. Installation
 bash eval.sh
 ```
 We would like to express our gratitude to [Huajie Tan](https://huggingface.co/tanhuajie2001), [Yumeng Wang](https://huggingface.co/devymex), [Yin Xie](https://huggingface.co/Yin-Xie) for his significant contributions to the experimental validation in MLLMs.