Why do LLaVA Vision-Language Models Reply to Images in English? Paper • 2407.02333 • Published Jul 2, 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks Paper • 2407.03791 • Published Jul 4, 2024 • 1
Multilingual and Explainable Text Detoxification with Parallel Corpora Paper • 2412.11691 • Published Dec 16, 2024
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published 9 days ago • 18
Centurio Collection Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 5 items • Updated 8 days ago • 4
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published 9 days ago • 18
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 128
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 190
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published about 1 month ago • 18
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published 30 days ago • 73