Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
clefourrier 
posted an update Mar 5
Post
🔥 New multimodal leaderboard on the hub: ConTextual!

Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ... 🖼️
So how do you evaluate performance on this task?

The ConTextual team introduced a brand new dataset of instructions and images, to test LMM (large multimodal models) reasoning capabilities, and an associated leaderboard (with a private test set).

This is super exciting imo because it has the potential to be a good benchmark both for multimodal models and for assistants' vision capabilities, thanks to the instructions in the dataset.

Congrats to @rohan598 , @hbXNov , @kaiweichang and @violetpeng !!

Learn more in the blog: https://huggingface.co/blog/leaderboard-contextual
Leaderboard: ucla-contextual/contextual_leaderboard