|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
<p align="center"> |
|
<b><font size="6">Dispider</font></b> |
|
<p> |
|
|
|
<div align="center"> |
|
|
|
[💻Github Repo](https://github.com/Mark12Ding/Dispider) |
|
|
|
[📖Paper](https://arxiv.org/abs/2501.03218) |
|
|
|
</div> |
|
|
|
|
|
## Quick Start |
|
First download the checkpoints at the folder. |
|
|
|
|
|
**Important**: Modify the ``mm_compressor`` path in config.json to align with your local environment. The checkpoint for ``mm_compressor`` is located within a sub-folder of this repository. |
|
|
|
For detailed evaluation, please refer to [Github repo](https://github.com/Mark12Ding/Dispider). |
|
|
|
|
|
|
|
|
|
|
|
|
|
## ✒️ Citation |
|
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝. |
|
```bibtex |
|
@article{qian2025dispider, |
|
title={Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction}, |
|
author={Qian, Rui and Ding, Shuangrui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Lin, Dahua and Wang, Jiaqi}, |
|
journal={arXiv preprint arXiv:2501.03218}, |
|
year={2025} |
|
} |
|
|
|
@article{qian2025streaming, |
|
title={Streaming long video understanding with large language models}, |
|
author={Qian, Rui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Ding, Shuangrui and Lin, Dahua and Wang, Jiaqi}, |
|
journal={Advances in Neural Information Processing Systems}, |
|
volume={37}, |
|
pages={119336--119360}, |
|
year={2025} |
|
} |
|
``` |