Any-to-Any
Safetensors
qwen2

Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen†, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue‑, Lu Jiang‑

† Project Lead  β€‘ Corresponding Authors

Project Page Tar Paper on arXiv Huggingface Model Huggingface Space Huggingface Space

Citation

@article{han2025tar,
  title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
  author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
  journal={arXiv preprint arXiv:2506.18898},
  year={2025},
}

License

This project is licensed under the Apache 2.0 License.

Downloads last month
11
Safetensors
Model size
2.57B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ByteDance-Seed/Tar-1.5B

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(929)
this model

Spaces using ByteDance-Seed/Tar-1.5B 2

Collection including ByteDance-Seed/Tar-1.5B