Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
davidberenstein1957
's Collections
Synthetic Data Papers
Dataset Viber annotators
LLM evals and benchmark datasets
Useful Spaces
Cool and fun Spaces
Model Leaderboards
Useful models
Useful datasets
LLM evals and benchmark datasets
updated
Aug 17
Upvote
2
allenai/reward-bench
Viewer
•
Updated
Sep 9
•
8.11k
•
7.26k
•
73
openai/openai_humaneval
Viewer
•
Updated
Jan 4
•
164
•
140k
•
246
google/IFEval
Viewer
•
Updated
Aug 14
•
541
•
5.75k
•
34
allenai/ai2_arc
Viewer
•
Updated
Dec 21, 2023
•
7.79k
•
116k
•
142
allenai/winogrande
Updated
Jan 18
•
82.5k
•
57
TIGER-Lab/MMLU-Pro
Viewer
•
Updated
22 days ago
•
12.1k
•
29k
•
281
cais/mmlu
Viewer
•
Updated
Mar 8
•
231k
•
66.7k
•
321
truthfulqa/truthful_qa
Viewer
•
Updated
Jan 4
•
1.63k
•
29.2k
•
199
openai/gsm8k
Viewer
•
Updated
Jan 4
•
17.6k
•
200k
•
409
Rowan/hellaswag
Viewer
•
Updated
Sep 28, 2023
•
60k
•
99.3k
•
94
tatsu-lab/alpaca_eval
Updated
Aug 16
•
22.1k
•
50
HuggingFaceH4/mt_bench_prompts
Viewer
•
Updated
Jul 3, 2023
•
80
•
603
•
16
nvidia/ChatRAG-Bench
Viewer
•
Updated
May 24
•
34.6k
•
1.42k
•
98
rungalileo/ragbench
Viewer
•
Updated
Jun 11
•
95.4k
•
1.45k
•
12
Upvote
2
Share collection
View history
Collection guide
Browse collections