VITA-1.5 / README.md
lxysl's picture
Add model card (#2)
e42b47f verified
metadata
pipeline_tag: video-text-to-text

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

Code: https://github.com/VITA-MLLM/VITA