Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22 • 50
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 23
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 64
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper • 2404.02101 • Published Apr 2 • 22
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 69
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 81
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 98
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback Paper • 2307.14936 • Published Jul 27, 2023 • 40
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 27