Collections
Discover the best community collections!
Collections including paper arxiv:2311.12983
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 10 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 4 -
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Paper • 2309.06495 • Published • 1 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35
-
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258
-
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper • 2312.03818 • Published • 32 -
Scaling Laws of Synthetic Images for Model Training ... for Now
Paper • 2312.04567 • Published • 7 -
Large Language Models for Mathematicians
Paper • 2312.04556 • Published • 11 -
LooseControl: Lifting ControlNet for Generalized Depth Conditioning
Paper • 2312.03079 • Published • 12
-
Instruction-Following Evaluation for Large Language Models
Paper • 2311.07911 • Published • 19 -
HuggingFaceH4/mt_bench_prompts
Viewer • Updated • 80 • 603 • 16 -
vectara/hallucination_evaluation_model
Text Classification • Updated • 732k • 227 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 183
-
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper • 2311.02462 • Published • 33 -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5 -
A Survey on Evaluation of Large Language Models
Paper • 2307.03109 • Published • 42 -
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Paper • 2306.13651 • Published • 15