Papers
arxiv:2509.23661

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Published on Sep 28
· Submitted by xiangan on Sep 29
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Bo Li ,

Abstract

LLaVA-OneVision-1.5 is a family of large multimodal models that achieves state-of-the-art performance with reduced costs through efficient training and high-quality datasets.

AI-generated summary

We present LLaVA-OneVision-1.5, a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. Different from the existing works, LLaVA-OneVision-1.5 provides an open, efficient, and reproducible framework for building high-quality vision-language models entirely from scratch. The LLaVA-OneVision-1.5 release comprises three primary components: (1) Large-Scale Curated Datasets: We construct an 85M concept-balanced pretraining dataset LLaVA-OneVision-1.5-Mid-Traning and a meticulously curated 26M instruction dataset LLaVA-OneVision-1.5-Instruct, collectively encompassing 64B compressed multimodal tokens. (2) Efficient Training Framework: We develop a complete end-to-end efficient training framework leveraging an offline parallel data packing strategy to facilitate the training of LLaVA-OneVision-1.5 within a $16,000 budget. (3) State-of-the-art Performance: Experimental results demonstrate that LLaVA-OneVision1.5 yields exceptionally competitive performance across a broad range of downstream tasks. Specifically, LLaVA-OneVision-1.5-8B outperforms Qwen2.5-VL-7B on 18 of 27 benchmarks, and LLaVA-OneVision-1.5-4B surpasses Qwen2.5-VL-3B on all 27 benchmarks. We anticipate releasing LLaVA-OneVision-1.5-RL shortly and encourage the community to await further updates.

Community

Paper author Paper submitter
edited 10 days ago

image

Paper author Paper submitter

image

Paper author Paper submitter

image

Outstanding release! Thank you so much

Sign up or log in to comment

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 2

Collections including this paper 2