Post
2482
A great vision language benchmark: MM-UPD evaluates how model responds to unsolvable problems 🤓
As of now, most VLMs, including GPT-4V and LLaVA-Next-34B, struggle with refusing to answer
Dataset MM-UPD/MM-UPD
Paper Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (2403.20331)
As of now, most VLMs, including GPT-4V and LLaVA-Next-34B, struggle with refusing to answer
Dataset MM-UPD/MM-UPD
Paper Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (2403.20331)