ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper β’ 2504.05506 β’ Published Apr 7 β’ 22
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 144
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper β’ 2502.01341 β’ Published Feb 3 β’ 39
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper β’ 2412.04626 β’ Published Dec 5, 2024 β’ 14
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild Paper β’ 2407.04172 β’ Published Jul 4, 2024 β’ 27