-
A Survey on Language Models for Code
Paper • 2311.07989 • Published • 21 -
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper • 2310.06770 • Published • 4 -
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 10 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 10
walterShen
walterShen
AI & ML interests
None yet
Organizations
None yet
Collections
8
models
None public yet
datasets
None public yet