zijian.kang
commited on
Commit
Β·
0ce0168
1
Parent(s):
ee8d1fb
update paper link
Browse files
README.md
CHANGED
@@ -10,12 +10,14 @@ base_model:
|
|
10 |
|
11 |

|
12 |
|
13 |
-
SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes.
|
14 |
|
15 |
|
16 |
In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
|
17 |
|
18 |
## Newsπππ
|
|
|
|
|
19 |
- 2024-12-25: π We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
|
20 |
|
21 |
|
@@ -29,7 +31,7 @@ In a word, SAIL-VL is a foundational VLM for vision-language applications. Welco
|
|
29 |
|
30 |
### Training Recipes Overview:
|
31 |
|
32 |
-
Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance.
|
33 |
|
34 |

|
35 |
|
|
|
10 |
|
11 |

|
12 |
|
13 |
+
SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes. Stronger models are comming soon~
|
14 |
|
15 |
|
16 |
In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
|
17 |
|
18 |
## Newsπππ
|
19 |
+
- 2024-1-10: π We released our paper on Arxiv: [Scalable Vision Language Model Training via High Quality Data Curation
|
20 |
+
](https://arxiv.org/abs/2501.05952)
|
21 |
- 2024-12-25: π We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
|
22 |
|
23 |
|
|
|
31 |
|
32 |
### Training Recipes Overview:
|
33 |
|
34 |
+
Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance.
|
35 |
|
36 |

|
37 |
|