Rui Zhao's picture

Rui Zhao

ruizhaocv

·

https://ruizhaocv.github.io/

AI & ML interests

Multimodal and GenAI

Organizations

ruizhaocv's activity

upvoted a paper 5 days ago

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published 5 days ago • 63

upvoted 2 papers 17 days ago

Retrieval Head Mechanistically Explains Long-Context Factuality

Paper • 2404.15574 • Published Apr 24 • 2

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Paper • 2410.18860 • Published 19 days ago • 8

upvoted 4 papers 18 days ago

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Paper • 2410.18451 • Published 20 days ago • 13

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published 19 days ago • 16

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Paper • 2410.18798 • Published 19 days ago • 19

Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published 19 days ago • 34

upvoted a paper 26 days ago

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published 26 days ago • 23

upvoted a paper 30 days ago

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published Oct 9 • 18

upvoted 3 papers about 1 month ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9 • 59

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Paper • 2410.05363 • Published Oct 7 • 44

Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9 • 69

upvoted 2 papers 3 months ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121

Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24 • 20

upvoted 2 papers 8 months ago

DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12 • 13

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper • 2403.06098 • Published Mar 10 • 15

upvoted 4 papers 9 months ago

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27 • 21

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27 • 16

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18