view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen • Jan 15 • 195
view article Article 🪆 Introduction to Matryoshka Embedding Models By tomaarsen and 2 others • Feb 23, 2024 • 142
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • May 15 • 115
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 130
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146
MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching Paper • 2502.12852 • Published Feb 18 • 3
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking Paper • 2502.13766 • Published Feb 19 • 3
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild Paper • 2502.12769 • Published Feb 18 • 3
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 498
Centurio Collection Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4 • 4
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published Jan 9 • 20
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 160
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Apr 28 • 220
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published Dec 18, 2024 • 18
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published Dec 19, 2024 • 74
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 112
LLaVA-Onevision Collection LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 15