README / README.md
Lillianwei's picture
Update README.md
e53fd38 verified
|
raw
history blame
3.05 kB
metadata
title: README
emoji: 🏒
colorFrom: blue
colorTo: gray
sdk: gradio
pinned: false

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

[πŸ“– Project] [πŸ“„ Paper] [πŸ’» Code] [πŸ“ Dataset] [πŸ€– Evaluation Model] [πŸ† Leaderboard]

We introduce MMIE, a robust, knowledge-intensive benchmark to evaluate interleaved multimodal comprehension and generation in LVLMs. With 20K+ examples covering 12 fields and 102 subfields, MMIE is definitely setting new standards for testing the depths of multimodal understanding.

πŸ”‘ Key Features:

  • πŸ—‚ Comprehensive Dataset: With 20,103 interleaved multimodal questions, MMIE provides a rich foundation for evaluating models across diverse domains.
  • πŸ” Ground Truth Reference: Each query includes a reliable reference, ensuring model outputs are measured accurately.
  • βš™ Automated Scoring with MMIE-Score: Our scoring model achieves high human-score correlation, surpassing previous metrics like GPT-4o for multimodal tasks.
  • πŸ”Ž Bias Mitigation: Fine-tuned for fair assessments, enabling more objective model evaluations.

πŸ” Key Insights:

  1. 🧠 In-depth Evaluation: Covering 12 major fields (mathematics, coding, literature, and more) with 102 subfields for a comprehensive test across competencies.
  2. πŸ“ˆ Challenging the Best: Even top models like GPT-4o + SDXL peak at 65.47%, highlighting room for growth in LVLMs.
  3. 🌐 Designed for Interleaved Tasks: The benchmark supports evaluation across both text and image comprehension with both multiple-choice and open-ended formats.