Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chenโ€ , Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yueโ€ก, Lu Jiangโ€ก

โ€  Project Lead  โ€ก Corresponding Authors

Project Page Tar Paper on arXiv Huggingface Model Huggingface Space Huggingface Space

Citation

@article{han2025tar,
  title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
  author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
  journal={arXiv preprint arXiv:2506.18898},
  year={2025},
}

License

This project is licensed under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for csuhan/TA-Tok

Finetuned
(9)
this model

Space using csuhan/TA-Tok 1