LLaVE - a zhibinlan Collection

zhibinlan 's Collections

LLaVE

LLaVE

updated Mar 10

LLaVE is a series of large language and vision embedding models trained on a variety of multimodal embedding datasets

zhibinlan/LLaVE-0.5B

Image-Text-to-Text • 0.9B • Updated Mar 14 • 38.8k • 7
zhibinlan/LLaVE-2B

Image-Text-to-Text • 2B • Updated Mar 14 • 26.6k • 45
zhibinlan/LLaVE-7B

Image-Text-to-Text • 8B • Updated Mar 14 • 262 • 5
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Paper • 2503.04812 • Published Mar 4 • 15