Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chenโ , Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yueโก, Lu Jiangโก
โ Project Lead โก Corresponding Authors

Citation
@article{han2025tar,
title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations},
author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
journal={arXiv preprint arXiv:2506.18898},
year={2025},
}
License
This project is licensed under the Apache 2.0 License.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for csuhan/TA-Tok
Base model
google/siglip2-so400m-patch14-384