Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Articles

Docmatix - a huge dataset for Document Visual Question Answering

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Putting ethical principles at the core of research lifecycle

Organizations

HugoLaurencon's activity

upvoted a paper 1 day ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published 2 days ago • 28

upvoted 2 papers 15 days ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published 16 days ago • 40

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published 17 days ago • 11

upvoted 2 papers 22 days ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published 25 days ago • 20

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published 23 days ago • 86

upvoted a paper 24 days ago

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published 25 days ago • 16

upvoted 3 papers about 1 month ago

Diversity-Rewarded CFG Distillation

Paper • 2410.06084 • Published Oct 8 • 10

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2 • 25

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Paper • 2402.19474 • Published Feb 29 • 2

upvoted 7 papers about 2 months ago

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20 • 67

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19 • 36

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 125

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Paper • 2409.11564 • Published Sep 17 • 19

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 106

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published Sep 13 • 46

upvoted 4 papers 2 months ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Paper • 2409.02326 • Published Sep 3 • 18

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Paper • 2408.17253 • Published Aug 30 • 35