Han-Bit Kang's picture

41 19

Han-Bit Kang

hbkang

·

AI & ML interests

ML

Organizations

None yet

hbkang's activity

upvoted a paper 6 days ago

Learning Video Representations without Natural Videos

Paper • 2410.24213 • Published 9 days ago • 14

upvoted a paper 19 days ago

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Paper • 2410.13726 • Published 23 days ago • 10

upvoted a paper 25 days ago

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published about 1 month ago • 48

upvoted a paper about 1 month ago

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3 • 24

upvoted 3 papers 2 months ago

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published Sep 6 • 23

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Paper • 2408.14886 • Published Aug 27 • 8

CSGO: Content-Style Composition in Text-to-Image Generation

Paper • 2408.16766 • Published Aug 29 • 17

upvoted a paper 5 months ago

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30 • 5

upvoted a paper 7 months ago

COCONut: Modernizing COCO Segmentation

Paper • 2404.08639 • Published Apr 12 • 27

upvoted 4 papers 9 months ago

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 188

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Paper • 2402.05054 • Published Feb 7 • 25

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30 • 32

Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Paper • 2401.15687 • Published Jan 28 • 21

upvoted 7 papers 10 months ago

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 21

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 69

Synthesizing Moving People with 3D Control

Paper • 2401.10889 • Published Jan 19 • 12

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 58

VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 37

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 58

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 41