tsinghua-ee
/

video-SALMONN-2

Video-Text-to-Text

text-generation

text-generation-inference

Model card Files Files and versions Community

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Downloads last month: 17

Safetensors

Model size

8.72B params

Tensor type

I64

·

BF16

·

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tsinghua-ee/video-SALMONN-2

Base model

Qwen/Qwen2-7B

Finetuned

(70)

this model

Datasets used to train tsinghua-ee/video-SALMONN-2