Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Paper
•
2506.18898
•
Published
•
23
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Unified MLLM with Text-Aligned Representations
Unified MLLM with Text-Aligned Representations