metadata
license: apache-2.0
datasets:
- HuggingFaceFV/finevideo
- lmms-lab/LLaVA-Video-178K
- ShareGPT4Video/ShareGPT4Video
language:
- en
metrics:
- accuracy
base_model:
- Qwen/Qwen2-7B
- lmms-lab/llava-onevision-qwen2-7b-ov
- openai/whisper-large-v3
pipeline_tag: video-text-to-text
library_name: transformers
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models