Interpretable non-linear dimensionality reduction using gaussian weighted linear transformation Paper • 2504.17601 • Published 3 days ago • 2
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published 3 days ago • 5
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Paper • 2504.17414 • Published 3 days ago • 5
Boosting Generative Image Modeling via Joint Image-Feature Synthesis Paper • 2504.16064 • Published 5 days ago • 7
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published 4 days ago • 4
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper • 2504.17040 • Published 4 days ago • 8
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 3 days ago • 12
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 3 days ago • 51
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 3 days ago • 68
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning Paper • 2504.14509 • Published 7 days ago • 43
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Paper • 2504.16030 • Published 5 days ago • 27
RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild Paper • 2504.14977 • Published 6 days ago • 9
MR. Video: "MapReduce" is the Principle for Long Video Understanding Paper • 2504.16082 • Published 5 days ago • 5
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 5 days ago • 14
Personalized Text-to-Image Generation with Auto-Regressive Models Paper • 2504.13162 • Published 10 days ago • 17
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 5 days ago • 51
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17, 2024 • 30
Context-Aware Token Selection and Packing for Enhanced Vision Transformer Paper • 2410.23608 • Published Oct 31, 2024 • 1
Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern Paper • 2412.04757 • Published Dec 6, 2024 • 1