LLAVA-OV-7B_RoadSocial_Finetuned

This model accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives.

Model Summary

LLAVA-OV-7B_RoadSocial_Finetuned is an open-source large multimodal model with superior generic road event understanding capabilities. Built on the foundation of llava-onevision-7b-ov, it has been finetuned on RoadSocial-260k dataset. Evaluated on the RoadSocial benchmark, Its performance is on par with SOTA closed-source models (GPT-4o, Gemini-1.5-pro), thereby demonstrating the RoadSocial dataset's capability in improving the understanding of general-purpose of Video-LLMs.

For further details, please refer to the following resources:

Use

Refer to our code repository for this model's inference script.

Citation

@misc{parikh2025roadsocialdiversevideoqadataset,
      title={RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives}, 
      author={Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
      year={2025},
      eprint={2503.21459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.21459}, 
}
Downloads last month
26
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chiragp26/LLAVA-OV-7B_RoadSocial_Finetuned

Finetuned
(14)
this model

Dataset used to train chiragp26/LLAVA-OV-7B_RoadSocial_Finetuned