Video-Text-to-Text
Transformers
Safetensors
English
qwen2
text-generation
text-generation-inference
video-SALMONN-2 / README.md
DragonAura's picture
Update README.md
b91c40b verified
metadata
license: apache-2.0
datasets:
  - HuggingFaceFV/finevideo
  - lmms-lab/LLaVA-Video-178K
  - ShareGPT4Video/ShareGPT4Video
language:
  - en
metrics:
  - accuracy
base_model:
  - Qwen/Qwen2-7B
  - lmms-lab/llava-onevision-qwen2-7b-ov
  - openai/whisper-large-v3
pipeline_tag: video-text-to-text
library_name: transformers

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models