ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models Paper • 2405.17724 • Published May 28, 2024
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 12 days ago • 94
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks Paper • 2310.11612 • Published Oct 17, 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution Paper • 2310.13276 • Published Oct 20, 2023
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29, 2024 • 50
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Paper • 2504.12764 • Published Apr 17 • 41
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Paper • 2504.12764 • Published Apr 17 • 41
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 12 days ago • 94
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 12 days ago • 94
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models Paper • 2505.16854 • Published 17 days ago • 11
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary Paper • 2503.09402 • Published Mar 12 • 7
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 88
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 88
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper • 2406.11816 • Published Jun 17, 2024 • 25