Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.06072

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4 • 20
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published 10 days ago • 43

Papers I want to read

Papers in my to-read list

about 3 hours ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30 • 10
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Paper • 2406.00908 • Published Jun 3 • 12
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

Paper • 2406.02509 • Published Jun 4 • 9
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 16

less than a minute ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22 • 17
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23 • 19

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 15
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18 • 8
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13

Video Generation

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 15
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32
Fast View Synthesis of Casual Videos

Paper • 2312.02135 • Published Dec 4, 2023 • 8
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Paper • 2312.04433 • Published Dec 7, 2023 • 9

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

Paper • 2309.03549 • Published Sep 7, 2023 • 5
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Paper • 2310.11440 • Published Oct 17, 2023 • 15
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

Paper • 2310.10769 • Published Oct 16, 2023 • 8

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs