Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 1 day ago • 162
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published 8 days ago • 23
InstantIR: Blind Image Restoration with Instant Generative Reference Paper • 2410.06551 • Published Oct 9 • 6
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published 8 days ago • 22
Zero-shot Model-based Reinforcement Learning using Large Language Models Paper • 2410.11711 • Published 28 days ago • 8
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published 22 days ago • 58
Pangea Collection A Fully Open Multilingual Multimodal LLM for 39 Languages • 18 items • Updated 11 days ago • 17
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published 22 days ago • 42
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 23 days ago • 56
DPLM-2: A Multimodal Diffusion Protein Language Model Paper • 2410.13782 • Published 26 days ago • 19
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published 27 days ago • 29
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published 29 days ago • 20
Can MLLMs Understand the Deep Implication Behind Chinese Images? Paper • 2410.13854 • Published 26 days ago • 8
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published 26 days ago • 27
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published 28 days ago • 46