Safetensors
English
Chinese
zijian.kang commited on
Commit
0ce0168
Β·
1 Parent(s): ee8d1fb

update paper link

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -10,12 +10,14 @@ base_model:
10
 
11
  ![lidar_map](statics/sail.png)
12
 
13
- SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes. Details and stronger models are comming soon~
14
 
15
 
16
  In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
17
 
18
  ## NewsπŸš€πŸš€πŸš€
 
 
19
  - 2024-12-25: πŸš€ We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
20
 
21
 
@@ -29,7 +31,7 @@ In a word, SAIL-VL is a foundational VLM for vision-language applications. Welco
29
 
30
  ### Training Recipes Overview:
31
 
32
- Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance. More details will be released soon.
33
 
34
  ![](statics/paper_page.png)
35
 
 
10
 
11
  ![lidar_map](statics/sail.png)
12
 
13
+ SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes. Stronger models are comming soon~
14
 
15
 
16
  In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
17
 
18
  ## NewsπŸš€πŸš€πŸš€
19
+ - 2024-1-10: πŸ“– We released our paper on Arxiv: [Scalable Vision Language Model Training via High Quality Data Curation
20
+ ](https://arxiv.org/abs/2501.05952)
21
  - 2024-12-25: πŸš€ We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
22
 
23
 
 
31
 
32
  ### Training Recipes Overview:
33
 
34
+ Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance.
35
 
36
  ![](statics/paper_page.png)
37