Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 7 items • Updated 4 days ago • 130
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 11 days ago • 82
HELM Datasets Collection These are datasets used for Vietnamese LLM evaluation • 18 items • Updated Apr 24 • 1
GIT Collection GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated 24 days ago • 13