LLAVA-OV-7B_RoadSocial_Finetuned
This model accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives.
Model Summary
LLAVA-OV-7B_RoadSocial_Finetuned
is an open-source large multimodal model with superior generic road event understanding capabilities. Built on the foundation of llava-onevision-7b-ov
, it has been finetuned on RoadSocial-260k dataset. Evaluated on the RoadSocial benchmark, Its performance is on par with SOTA closed-source models (GPT-4o, Gemini-1.5-pro), thereby demonstrating the RoadSocial dataset's capability in improving the understanding of general-purpose of Video-LLMs.
For further details, please refer to the following resources:
- ๐ช Project Page: https://roadsocial.github.io
- ๐ฆ Dataset: https://huggingface.co/datasets/chiragp26/RoadSocial
- ๐ป Code: https://github.com/roadsocial/roadsocial
- ๐ฐ Paper: https://arxiv.org/abs/2503.21459
Use
Refer to our code repository for this model's inference script.
Citation
@misc{parikh2025roadsocialdiversevideoqadataset,
title={RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives},
author={Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
year={2025},
eprint={2503.21459},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.21459},
}
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chiragp26/LLAVA-OV-7B_RoadSocial_Finetuned
Base model
lmms-lab/llava-onevision-qwen2-7b-ov