@merve on Hugging Face: "A great vision language benchmark: MM-UPD evaluates how model responds to…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

merve

posted an update Apr 19

Post

2482

A great vision language benchmark: MM-UPD evaluates how model responds to unsolvable problems 🤓

As of now, most VLMs, including GPT-4V and LLaVA-Next-34B, struggle with refusing to answer
Dataset MM-UPD/MM-UPD
Paper Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (2403.20331)

In this post

merve Merve Noyan