microsoft/Phi-3.5-mini-instruct · Why not include MedQA in your benchmarks?

Aug 20

•

It's one of the good reasoning benchmarks built on USMLE questions. This benchmark was included in phi-3 and its June update so it makes sense to include it in phi-3.5 benchmarks no?

Thanks for the model and all your work too!

nguyenbh

Microsoft org Aug 20

•

edited Aug 20

Thank you for your interest in the Phi-3.5 models! We did benchmark MedQA 🩺 but we will let the community to run this benchmark by themself (hint: we think the Phi-3.5 MoE and Mini are very competitive 🌞)

Hugman2345

Aug 21

•

edited Aug 21

It's great and competes with much bigger models on USMLE/Medical questions, information and reasoning. In this area, phi-3.5 is better than other 7b,8b,9b competitors and phi-3.5's bigger context size is a plus, sadly it feels like it doesn't beat Phi-3-small-8k and Phi-3-medium-4k in this particular area. This is just from first impressions and needs to be confirmed by others. Definitely so much better than other tiny models it's not even remotely close.

Thanks for Phi-3.5, I don't know how such a small model is even close to the level of big models.

nguyenbh

Microsoft org Aug 22

•

edited Aug 25

@Hugman2345 Thank you for your effort on independently benchmarking the Phi-3.5 models on MedQA. It is great to see that the models perform within our expectation.