Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

walterShen's picture

17 2

walterShen

walterShen

drgitt's profile picture

·

_walterShen

AI & ML interests

None yet

Organizations

None yet

Collections 8

Code LMs Evaluation

A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 21
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 4
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5 • 10
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22 • 10

Code LMs Benchmark

Running

976

📈

Big Code Models Leaderboard
Running

413

🏆

Can Ai Code Results
openai/openai_humaneval

Viewer • Updated Jan 4 • 164 • 140k • 246
google-research-datasets/mbpp

Viewer • Updated Jan 4 • 1.4k • 152k • 143

models

None public yet

datasets

None public yet

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs