Visual Grounded Reasoning
AI & ML interests
None defined yet.
Recent Activity
View all activity
Visual Foundation Models Powering Vision-Language Models
-
BytedanceDouyinContent/SAILViT-Large-300M-448px
Image Feature Extraction • 0.3B • Updated • 7 • 1 -
BytedanceDouyinContent/SAILViT-Huge-600M-448px
Image Feature Extraction • 0.7B • Updated • 10 -
SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement
Paper • 2507.01643 • Published • 1
Visual Grounded Reasoning
Scalable Vision Language Model Training via High Quality Data Curation
Visual Foundation Models Powering Vision-Language Models
-
BytedanceDouyinContent/SAILViT-Large-300M-448px
Image Feature Extraction • 0.3B • Updated • 7 • 1 -
BytedanceDouyinContent/SAILViT-Huge-600M-448px
Image Feature Extraction • 0.7B • Updated • 10 -
SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement
Paper • 2507.01643 • Published • 1