Submit SWE-bench result

by EwoutH - opened Sep 9

Discussion

EwoutH

Sep 9

•

edited Sep 9

A SWE-bench verified result of 16.8 was noted in the model card. Congratulations!

It would be great if that could be submitted to https://github.com/swe-bench/experiments, to be on the official scoreboard and verifiable.

xvweirong

Sep 11

It appears that the SWE-verified score of 16.8 is lower than the DeepSeek-Coder-V2-0724 score of 19. Does this indicate a significant decline in programming capability?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment