Xiangtai commited on
Commit
e3dec24
·
verified ·
1 Parent(s): 9e1f368

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -19
README.md CHANGED
@@ -1,15 +1,9 @@
1
  ---
2
- license: mit
3
  pipeline_tag: image-text-to-text
4
  library_name: transformers
5
  base_model:
6
- - OpenGVLab/InternVL2-1B
7
- - OpenGVLab/InternVL2_5-8B
8
- - OpenGVLab/InternVL2_5-4B
9
- - OpenGVLab/InternViT-300M-448px-V2_5
10
- - internlm/internlm2_5-7b-chat
11
- - Qwen/Qwen2-0.5B-Instruct
12
- - Qwen/Qwen2.5-3B-Instruct
13
  base_model_relation: merge
14
  language:
15
  - multilingual
@@ -34,18 +28,20 @@ Sa2VA is an MLLM capable of question answering, visual prompt understanding, and
34
 
35
  We built the Sa2VA series based on Qwen2-VL and InternVL2/2.5. In the following table, we provide some Sa2VA models built on InternVL2.5. Other Sa2VA models will be open-sourced soon.
36
 
37
- | Model Name | Base MLLM | Language Part | HF Link |
38
- |:----------:|:-----------------------------------------------------------------:|:---------------------------------------------------------------------------:|:----------------------------------------------------:|
39
- | Sa2VA-1B | [InternVL2.0-1B](https://huggingface.co/OpenGVLab/InternVL2-1B) | [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-1B) |
40
- | Sa2VA-4B | [InternVL2.5-4B](https://huggingface.co/OpenGVLab/InternVL2_5-4B) | [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-4B) |
41
- | Sa2VA-8B | [InternVL2.5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) | [internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-8B) |
 
42
 
43
  ## Sa2VA Performance
44
- | Model Name | MMBench | MME | RefCOCO | RefCOCO+ | RefCOCOg | MeVIS | DAVIS | ReVOS |
45
- |:----------:|:---------------------------------------------------------------:|:--------------------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:-----:|
46
- | Sa2VA-1B | 1381/405 | 68.3 | 77.4 | 69.9 | 72.3 | 50.8 | 72.3 | 47.6 |
47
- | Sa2VA-4B | 1536/530 | 77.3 | 78.9 | 71.7 | 74.1 | 52.1 | 73.8 | 53.2 |
48
- | Sa2VA-8B | 1617/511 | 81.6 | 81.6 | 76.2 | 78.7 | 57.0 | 75.2 | 57.6 |
 
49
 
50
 
51
  ## Quick Start
@@ -60,7 +56,7 @@ import numpy as np
60
  import os
61
 
62
  # load the model and tokenizer
63
- path = "ByteDance/Sa2VA-4B"
64
  model = AutoModel.from_pretrained(
65
  path,
66
  torch_dtype=torch.bfloat16,
 
1
  ---
2
+ license: apache-2.0
3
  pipeline_tag: image-text-to-text
4
  library_name: transformers
5
  base_model:
6
+ - OpenGVLab/InternVL2.5-1B
 
 
 
 
 
 
7
  base_model_relation: merge
8
  language:
9
  - multilingual
 
28
 
29
  We built the Sa2VA series based on Qwen2-VL and InternVL2/2.5. In the following table, we provide some Sa2VA models built on InternVL2.5. Other Sa2VA models will be open-sourced soon.
30
 
31
+ | Model Name | Base MLLM | Language Part | HF Link |
32
+ |:----------:|:------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:-----------------------------------------------------:|
33
+ | Sa2VA-1B | [InternVL2.5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) | [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-1B) |
34
+ | Sa2VA-4B | [InternVL2.5-4B](https://huggingface.co/OpenGVLab/InternVL2_5-4B) | [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-4B) |
35
+ | Sa2VA-8B | [InternVL2.5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) | [internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-8B) |
36
+ | Sa2VA-26B | [InternVL2.5-26B](https://huggingface.co/OpenGVLab/InternVL2_5-26B) | [internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat) | [🤗 link](https://huggingface.co/ByteDance/Sa2VA-26B) |
37
 
38
  ## Sa2VA Performance
39
+ | Model Name | MME | MMBench | RefCOCO | RefCOCO+ | RefCOCOg | MeVIS (val_u) | DAVIS |
40
+ |:----------:|:--------:|:----:|:-------:|:--------:|:--------:|:-------------:|:-----:|
41
+ | Sa2VA-1B | 1504/434 | 71.9 | 79.6 | 73.6 | 77.7 | 53.4 | 69.5 |
42
+ | Sa2VA-4B | 1691/610 | 81.8 | 82.4 | 77.6 | 79.7 | 55.9 | 73.7 |
43
+ | Sa2VA-8B | 1690/610 | 84.4 | 82.6 | 78.0 | 80.3 | 58.9 | 75.9 |
44
+ | Sa2VA-26B | 1698/653 | 85.8 | 82.9 | 79.3 | 81.2 | 61.8 | 78.6 |
45
 
46
 
47
  ## Quick Start
 
56
  import os
57
 
58
  # load the model and tokenizer
59
+ path = "ByteDance/Sa2VA-1B"
60
  model = AutoModel.from_pretrained(
61
  path,
62
  torch_dtype=torch.bfloat16,