Collections
Discover the best community collections!
Collections including paper arxiv:2504.04842
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper โข 2402.17485 โข Published โข 194 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper โข 2312.01841 โข Published โข 1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper โข 2311.16498 โข Published โข 1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper โข 2312.02134 โข Published โข 2
-
CoLLM: A Large Language Model for Composed Image Retrieval
Paper โข 2503.19910 โข Published โข 11 -
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Paper โข 2503.21541 โข Published โข 1 -
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
Paper โข 2504.03536 โข Published โข 11 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper โข 2504.04842 โข Published โข 30
-
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Paper โข 2412.11279 โข Published โข 12 -
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
Paper โข 2501.02260 โข Published โข 5 -
GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor
Paper โข 2501.09978 โข Published โข 6 -
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation
Paper โข 2502.13995 โข Published โข 9
-
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper โข 2412.01106 โข Published โข 20 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper โข 2412.04448 โข Published โข 10 -
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Paper โข 2412.14963 โข Published โข 6 -
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Paper โข 2502.01061 โข Published โข 212
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper โข 2312.08578 โข Published โข 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper โข 2312.08583 โข Published โข 12 -
Vision-Language Models as a Source of Rewards
Paper โข 2312.09187 โข Published โข 14 -
StemGen: A music generation model that listens
Paper โข 2312.08723 โข Published โข 49