LLaVA-OneVision-1.5-8B Initialization Model Card

🚀 Overview

This model provides an initialization checkpoint for training LLaVA-OneVision-1.5, designed to combine strong language and vision capabilities. It integrates a powerful LLM and a state-of-the-art vision encoder, with a flexible adapter to enable efficient multimodal learning.

🏗️ Key Components

Vision Encoder:
Uses the pretrained ViT model from DeepGlint-AI/rice-vit-large-patch14-560 to extract rich visual features.
Adapter:
A randomly initialized adapter module with 4× token compression capability, enabling efficient fusion of image and text modalities.
Language Model:
Incorporates the pretrained language model Qwen/Qwen3-4B-Instruct-2507 for robust text understanding and generation.

📝 Usage

This initialization checkpoint is intended for downstream training and fine-tuning. For usage and training scripts, please refer to the EvolvingLMMs-Lab/LLaVA-OneVision-1.5 repository.

📚 References

Citation

If you find LLaVA-OneVision-1.5 useful in your research, please consider to cite the following related papers:

@misc{an2025llavaonevision15fullyopenframework,
      title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training}, 
      author={Xiang An and Yin Xie and Kaicheng Yang and Wenkang Zhang and Xiuwei Zhao and Zheng Cheng and Yirui Wang and Songcen Xu and Changrui Chen and Chunsheng Wu and Huajie Tan and Chunyuan Li and Jing Yang and Jie Yu and Xiyao Wang and Bin Qin and Yumeng Wang and Zizhen Yan and Ziyong Feng and Ziwei Liu and Bo Li and Jiankang Deng},
      year={2025},
      eprint={2509.23661},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.23661}, 
}

⚖️ License

Apache 2.0

Downloads last month: 50

Safetensors

Model size

4.35B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lmms-lab/LLaVA-OneVision-1.5-4B-stage0

Base model

DeepGlint-AI/rice-vit-large-patch14-560

Finetuned

(9)

this model