Video-Text-to-Text
Transformers
Safetensors
English
qwen2
text-generation
text-generation-inference

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Downloads last month
17
Safetensors
Model size
8.72B params
Tensor type
I64
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tsinghua-ee/video-SALMONN-2

Base model

Qwen/Qwen2-7B
Finetuned
(70)
this model

Datasets used to train tsinghua-ee/video-SALMONN-2