LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published 7 days ago • 8
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3 • 48
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Paper • 2410.01999 • Published Oct 2 • 10