Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding Paper • 2508.20478 • Published 11 days ago • 16
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10, 2024 • 26
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 39
PixArt-Alpha Collection This collection organize all the PixArt-Alpha related models, datasets and so on. • 9 items • Updated May 4, 2024 • 5
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22, 2024 • 19
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published Jul 11, 2024 • 26
Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics Paper • 2310.17316 • Published Oct 26, 2023 • 1
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation Paper • 2307.08448 • Published Jul 17, 2023