BytedanceDouyinContent
/

SAIL-VL-2B

Model card Files Files and versions Community

zijian.kang commited on Jan 16

Commit

0ce0168

·

1 Parent(s): ee8d1fb

update paper link

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -10,12 +10,14 @@ base_model:
 ![lidar_map](statics/sail.png)
-SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes. Details and stronger models are comming soon~
 In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
 ## News🚀🚀🚀
 - 2024-12-25: 🚀 We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
@@ -29,7 +31,7 @@ In a word, SAIL-VL is a foundational VLM for vision-language applications. Welco
 ### Training Recipes Overview:
-Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance. More details will be released soon.
 ![](statics/paper_page.png)

 ![lidar_map](statics/sail.png)
+SAIL-VL is a state-of-the-art vision-language model (VLM) developed by the Bytedance Douyin Content Team. The goal of SAIL-VL is to develope a high-performance vision language model that facilitates deployment on mobile devices and ensures accessibility and affordability for a broad audience. Through careful tuning of data and training recipes, SAIL-VL demonstrates that even a small VLM can benefit significantly from data scaling. Our model outperforms Qwen2-VL, InternVL2 and even recent SoTA models of comparable sizes. Stronger models are comming soon~
 In a word, SAIL-VL is a foundational VLM for vision-language applications. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
 ## News🚀🚀🚀
+- 2024-1-10: 📖 We released our paper on Arxiv: [Scalable Vision Language Model Training via High Quality Data Curation
+](https://arxiv.org/abs/2501.05952)
 - 2024-12-25: 🚀 We ranked the 1st in [OpenCompass Multi-modal Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) among models of 2B parameters.
 ### Training Recipes Overview:
+Sail-VL benefits from high-quality data and carefully curated training recipes. We find the data quality, quantity and the design of curriculum training pipeline are crucial for model performance. With the proper design and data, the model's capacity scales effectively with data expansion at all stages, leading to enhanced performance.
 ![](statics/paper_page.png)