OpenGVLab
/

InternVL2-4B

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

Weiyun1025 commited on Jul 3

Commit

f41e149

•

1 Parent(s): 5e68c9e

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ license: mit
 pipeline_tag: visual-question-answering
 ---
-# Model Card for InternVL2-4B
 [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)  [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
@@ -17,6 +17,10 @@ Compared to the state-of-the-art open-source multimodal large language models, I
 InternVL 2.0 is trained with an 8k context window and utilizes training data consisting of long texts, multiple images, and videos, significantly improving its ability to handle these types of inputs compared to InternVL 1.5. For more details, please refer to our blog and GitHub.
 ## Performance
 |          Benchmark           | PaliGemma-3B | Phi-3-Vision | Mini-InternVL-4B-1.5 | InternVL2-4B |

 pipeline_tag: visual-question-answering
 ---
+# InternVL2-4B
 [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)  [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
 InternVL 2.0 is trained with an 8k context window and utilizes training data consisting of long texts, multiple images, and videos, significantly improving its ability to handle these types of inputs compared to InternVL 1.5. For more details, please refer to our blog and GitHub.
+## Model Details
+InternVL2 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned models optimized for multimodal tasks. InternVL2-4B consists of [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px), an MLP projector, and [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct).
 ## Performance
 |          Benchmark           | PaliGemma-3B | Phi-3-Vision | Mini-InternVL-4B-1.5 | InternVL2-4B |