Zhang199
/

TinyLLaVA-Video-Coldstart_NextQA_16

Video-Text-to-Text

text2text-generation

Model card Files Files and versions Community

TinyLLaVA-Video-Coldstart_NextQA_16 / README.md

Zhang199's picture

Update README.md

241f3ad verified 11 days ago

|

history blame contribute delete

852 Bytes

	---
	license: apache-2.0
	pipeline_tag: video-text-to-text
	library_name: transformers
	---

	<center><span style="font-size:2em;">TinyLLaVA-Video-R1</span></center>

	[![arXiv](https://img.shields.io/badge/Arxiv-2504.09641-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2504.09641)[![Github](https://img.shields.io/badge/Github-Github-blue.svg)](https://github.com/ZhangXJ199/TinyLLaVA-Video-R1)

	This model is obtained by cold-starting [TinyLLaVA-Video](https://huggingface.co/Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Group-16-512) with 16 manually annotated samples from the NextQA dataset. It serves as the base model for [TinyLLaVA-Video-R1](https://huggingface.co/Zhang199/TinyLLaVA-Video-R1).

	The 16 manually annotated samples used for cold-starting have been released [here](https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data).