LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published 12 days ago • 39
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published 14 days ago • 101
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published 16 days ago • 37
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct Image-Text-to-Text • 9B • Updated 5 days ago • 2.89k • 42
LLaVA-OneVision-1.5 Collection https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 • 6 items • Updated 1 day ago • 15
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published 21 days ago • 51
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval Paper • 2509.09118 • Published 29 days ago • 8
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval Paper • 2509.09118 • Published 29 days ago • 8
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval Paper • 2509.09118 • Published 29 days ago • 8 • 2