LLAVA-OV-7B_RoadSocial_Finetuned

This model accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives.

Model Summary

LLAVA-OV-7B_RoadSocial_Finetuned is an open-source large multimodal model with superior generic road event understanding capabilities. Built on the foundation of llava-onevision-7b-ov, it has been finetuned on RoadSocial-260k dataset. Evaluated on the RoadSocial benchmark, Its performance is on par with SOTA closed-source models (GPT-4o, Gemini-1.5-pro), thereby demonstrating the RoadSocial dataset's capability in improving the understanding of general-purpose of Video-LLMs.

For further details, please refer to the following resources:

🪐 Project Page: https://roadsocial.github.io
📦 Dataset: https://huggingface.co/datasets/chiragp26/RoadSocial
💻 Code: https://github.com/roadsocial/roadsocial
📰 Paper: https://arxiv.org/abs/2503.21459

Use

Refer to our code repository for this model's inference script.

Citation

@misc{parikh2025roadsocialdiversevideoqadataset,
      title={RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives}, 
      author={Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
      year={2025},
      eprint={2503.21459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.21459}, 
}

chiragp26
/

LLAVA-OV-7B_RoadSocial_Finetuned

LLAVA-OV-7B_RoadSocial_Finetuned

Model Summary

Use

Citation

Model tree for chiragp26/LLAVA-OV-7B_RoadSocial_Finetuned

Dataset used to train chiragp26/LLAVA-OV-7B_RoadSocial_Finetuned