-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2407.02687
-
Controlling Space and Time with Diffusion Models
Paper • 2407.07860 • Published • 16 -
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Paper • 2407.03300 • Published • 11 -
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper • 2407.01392 • Published • 39 -
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Paper • 2407.02687 • Published • 22
-
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses
Paper • 2407.02551 • Published • 7 -
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Paper • 2407.03300 • Published • 11 -
TokenPacker: Efficient Visual Projector for Multimodal LLM
Paper • 2407.02392 • Published • 21 -
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Paper • 2407.02687 • Published • 22
-
Classifier-Free Diffusion Guidance
Paper • 2207.12598 • Published • 2 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 40 -
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Paper • 2404.07724 • Published • 12 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 64
-
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 15 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 26 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 55 -
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Paper • 2406.07546 • Published • 8
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 4
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 45 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28
-
AniClipart: Clipart Animation with Text-to-Video Priors
Paper • 2404.12347 • Published • 12 -
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Paper • 2404.11565 • Published • 14 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 43 -
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Paper • 2407.02687 • Published • 22
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2