Introduction
We are excited to introduce HawkVL, a series of multimodal large language models (MLLMs) featuring light-weight and efficiency.
Architecture:
- ViT: Qwen-ViT
- Projector: 2-layer MLP with pixel unshuffle
- LLM: Qwen2.5-1.5B
Evaluation
We evaluate on eight benchmarks specified in the OpenCompass leaderboard using VLMEvalKit, including:
MMBench_TEST_EN/CN_V11, MMStar, MMMU_DEV_VAL, MathVista_MINI, HallusionBench, AI2D_TEST, OCRBench, MMVet
The results are as follows:
Benchmark | HawkVL-2B |
---|---|
MMBench-TEST-avg | 64.9 |
MMStar | 48.2 |
MMMU-VAL | 43.9 |
MathVista_MINI | 44.1 |
HallusionBench | 58.5 |
AI2D_TEST | 67.4 |
OCRBench | 74.9 |
MMVet | 36.6 |
Avg | 54.8 |
License Agreement
All of our open-source models are licensed under the Apache-2.0 license.
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support