UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published 1 day ago • 44
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 25 days ago • 70
UQFF Collection UQFF models. Examples for each in the model card! • 15 items • Updated Oct 16, 2024 • 14
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published Feb 20 • 40
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 88
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published Feb 24 • 11
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Paper • 2502.18461 • Published Feb 25 • 15
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper • 2502.17363 • Published Feb 24 • 35
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 71
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 200
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 34
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model Paper • 2312.17240 • Published Dec 28, 2023 • 1