-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 11 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 22
Inui
Norm
AI & ML interests
Video Diffusion; Large Language Model; Object Detection; OCR
Recent Activity
upvoted
a
paper
7 days ago
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
upvoted
a
paper
14 days ago
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Organizations
TI2V Research
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 40 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 24 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 58 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 25
Multimodal Language Model
What does matter besides data receipt when training a Multimodal language model?
Language Model
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 105 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 404
Open Datasets
Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.
Video2Video
Image / Video Gen
Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 3 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 3
Fundamental Research
-
Scaling Law with Learning Rate Annealing
Paper • 2408.11029 • Published • 4 -
Token Turing Machines
Paper • 2211.09119 • Published • 1 -
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper • 2203.12602 • Published -
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Paper • 2305.13035 • Published
Computer Vision
Do we still need a network for specific computer vision tasks anymore today?
VAE
-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 11 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 22
Video2Video
TI2V Research
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 40 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 24 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 58 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 25
Image / Video Gen
Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 3 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 3
Multimodal Language Model
What does matter besides data receipt when training a Multimodal language model?
Fundamental Research
-
Scaling Law with Learning Rate Annealing
Paper • 2408.11029 • Published • 4 -
Token Turing Machines
Paper • 2211.09119 • Published • 1 -
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper • 2203.12602 • Published -
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Paper • 2305.13035 • Published
Language Model
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 105 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 404
Computer Vision
Do we still need a network for specific computer vision tasks anymore today?
Open Datasets
Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.