Submit SWE-bench result
#4
by
EwoutH
- opened
A SWE-bench verified result of 16.8 was noted in the model card. Congratulations!
It would be great if that could be submitted to https://github.com/swe-bench/experiments, to be on the official scoreboard and verifiable.
It appears that the SWE-verified score of 16.8 is lower than the DeepSeek-Coder-V2-0724 score of 19. Does this indicate a significant decline in programming capability?