BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Rethinking Spectral Augmentation for Contrast-based Graph Self-Supervised Learning Paper • 2405.19600 • Published May 30, 2024
Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization Paper • 2208.08681 • Published Aug 18, 2022
Roughness Index for Loss Landscapes of Neural Network Models of Partial Differential Equations Paper • 2103.11069 • Published Mar 20, 2021
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models Paper • 2404.05083 • Published Apr 7, 2024
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published 21 days ago
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks Paper • 2310.11612 • Published Oct 17, 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution Paper • 2310.13276 • Published Oct 20, 2023
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Paper • 2504.12764 • Published Apr 17 • 41
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 108